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A priceless resource 


The key to treatments for autism and schizophrenia could lie in the brains of recently deceased 
children. To make advances, researchers need access to an international bank of donated material. 


the brains of recently deceased children. The pay-off may 

seem vague — progress towards understanding, and perhaps 
treating, neurological conditions including autism and schizophrenia. 
The difficulties, however, are clearer — the brains must be donated by 
grieving parents who have just lost a child in sudden and sometimes 
violent circumstances. 

So it is easy to understand the reluctance of individual scientists, 
institutions and funding agencies to press too vocally for access to 
more brains from newborns, infants and older children, as well as 
for fetal brains obtained after abortion. Still, as a News Feature on 
page 442 shows, some patient groups in the United States are rais- 
ing their voices. These groups deserve support. Moreover, they need 
scientific organizations in the United States and abroad to endorse 
and work towards a more ambitious goal: an international tissue bank 
holding perhaps tens of thousands of brains from young children and 
human fetuses around the world. Nature today pledges its support 
for such a bank. 

The case for such a facility comes down to the growing number 
of scientists who wish to study brains from these early stages of 
development. Biological technologies now allow the extraction ofa 
wealth of information about neurological diseases caused by faulty 
brain development. Neurodevelopmental conditions such as autism, 
schizophrenia and bipolar disorder are a huge societal burden, yet 
there are few effective treatments for them. Schizophrenia alone 
costs the United States tens of billions of dollars each year. 

Many scientists working on these diseases have no access to young 
brain tissue. An informal survey of existing brain banks shows that 
they hold tissue from barely 1,300 brains collected from fetuses, young 
children and teenagers. 

How can the supply to science of such sensitive material — the very 
seat of a child’s personality — be increased on an international scale? 
The supply shortage is partly down to logistical problems. But these 
have been solved by some adult brain banks and, in principle, the 
logistics are no different when it comes to children. 

The largest obstacle — real or perceived — remains the sensitivity 
of the subject and the difficulty of raising it with parents. Ultimately, 
this must be confronted, and discussions in the United States are 
being led by autism advocacy groups, who are used to talking to 
distressed parents. They are working to convince parents of the 
value of donating a child’s brain, should the child die in an accident. 
In parallel, the US National Institutes of Health (NIH) has signed 
up in principle to establish a nationwide network of brain collec- 
tion that will actively include those of children, and serve wider 
biomedical communities. 

Things are moving slowly, but at least they are moving. And it is not 
too early to think how this slow domestic progress could be geared up 
to an international scale. More countries means more donors, and the 


| Bu subjects in modern science are as emotive as research on 


high natural variability between brains means that very large numbers 
can be needed to give studies of brain material statistical significance. 
International networking of brain banks has been done before. 
BrainNet Europe, for example, was established 10 years ago as 
a single portal for brain tissue collected in 19 different European 
countries, none of which holds significant amounts of children’s 
brains. The US advocacy organization Autism Speaks has already 

added a UK collection point in Oxford to its own brain bank. 
Networked collections must also 


“It is understandable extend to fetal brains, and this is 


that bodies suchasthe another reason for an international 
NIH don ‘tmake strong approach. At least eight people, 
public statements including four doctors, associated 
about extending fetal —_ with abortion procedures have been 


murdered for their activities in the 
United States in the past 20 years, so 
it is understandable that bodies such as the NIH don't make strong 
public statements about extending fetal collections. 

The Lieber Institute for Brain Development in Baltimore, Mary- 
land, dedicated to translational research in neurodevelopmental 
diseases, which opened formally this year, is building a collection 
of young brains to support its research programme. It will begin its 
fetal acquisitions at three locations in Europe — Scotland, Denmark 
and Bulgaria — where the political climate is less hostile. In fact, 
the Lieber institute is proving a model for how to move ahead on 
paediatric and fetal brain banking, with appropriate respect for per- 
sonal and political sensitivities. Would it cooperate with a national or 
international brain bank network, if created? Yes, say its organizers. 
But it won't wait for it. There is too much important research that 
needs to be done. = 


collections.” 


Animal talk 


Germany must do more to encourage dialogue 
on animal experimentation. 


the Chancellor's office, and just a short walk from other govern- 

ment buildings, the old family villa that is home to the Swiss 
embassy makes for a curious sight in the political heart of modern- 
day Berlin. Last week, the embassy hosted an international meeting 
of scientists from around the world who defend the use of animals 
in research. But despite being invited, nobody from the government 
offices bothered even to drop in. German animal-welfare groups also 


S andwiched between the towering edifices of the Bundestag and 
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declined to attend. That was unfortunate given that the gathering was 
intended to discuss the principles of the Basel Declaration, which pro- 
motes outreach by animal researchers to politicians and the public. 
And something else failed to materialize — Germany’s plans to create 
a professional office to promote and implement the Basel Declaration 
principles, which some attendees had hoped would be announced at 
the meeting by the country’s research organizations. 

This lack of action reflects poorly on Germany’s proclaimed interest 
in creating an environment within which its generously funded bio- 
medical research can flourish. And it is disconcerting, because, like all 
countries in the European Union (EU), Germany must translate into 
national law a complex and controversial directive that regulates the 
use of animals in research. 

The Basel Declaration was drafted at a meeting of mostly Swiss and 
German scientists last November. It has now been signed by nearly 
900 people, some 500 of whom came from other countries. The scien- 
tists want the declaration to have the same authority over the ethics of 
animal experimentation as the 1964 Declaration of Helsinki has over 
the ethics of human experimentation. The formal infrastructure being 
developed around the declaration could help to realize this ambition. 

The declaration was prompted by concerns over the EU animal- 
research directive, early drafts of which were so unfriendly to 
researchers that European scientists were shocked at how unpre- 
pared they were to lobby in the same arena as animal welfare and 
rights groups. Political battles raged for years before the directive was 
finally approved in 2010. Only one country abstained from what was 
otherwise a unanimous vote: Germany. 

Why? Germany handed prime responsibility for the directive to the 
agricultural ministry. Others in the government, notably the research 
ministry, disagreed with this approach and the two ministries could 


not agree on much right up to the vote. The agricultural ministry is 
now handling implementation without consultation with scientists. 
Had representatives of the German government showed up at last 
week's meeting, as their Swiss counterparts did, they would have heard 
from researchers how the loose wording of much of the directive could 
create difficulties for them while it is being implemented, and how 
they must therefore be consulted. 


“Germany must For example, the directive requires that a 
trans late into ‘severity degree’ classification be introduced 
national law a for all approved animal procedures. The sig- 
complex and natories to the Basel Declaration approve of 
controversial this, but some government offices in Europe 


directive.” have discussed whether an experiment 
should automatically be given a higher sever- 
ity grade if it uses animals that have been genetically engineered, and 
whether the classifications should be made public. 

German animal-welfare groups could also be part of these debates 
— as Swiss ones are — but they rarely communicate with the research 
community. 

This relationship presents a challenge for German signatories to the 
Basel agreement and is a prime example of why Germany needs an 
office to coordinate the outreach the declaration calls for. The coun- 
try’s research funding organizations — particularly stalwarts such as 
the Max Planck Society, the German Research Foundation and the 
Helmholtz Society — need to move swiftly to create such an office. 

Switzerland has dodged bullets aimed at its sturdy scientific base 
by animal-rights campaigners and opponents of genetic engineering 
in recent years, partly by maintaining excellent communication and 
transparency. Germany will find it even harder to bring these groups 
together — but even the longest journey must start with a short walk. m 


Scientific climate 


Results confirming climate change are welcome, 
even when released before peer review. 


lobal warming is really happening — really. There was no 
conspiracy or cover-up. Peer review did not fail and the scien- 
tists who have spent decades working out the best way to handle 
and process data turned out to know how to handle and process data 
after all. Thank you Berkeley Earth Surface Temperature (BEST) study. 

Four papers released by the BEST team at the University of Cali- 
fornia, Berkeley, last week are of undoubted interest to the media, 
given that they support what is portrayed as the mainstream scientific 
position on climate change. They could also find traction in politics, 
especially in the United States, where they could be used to combat the 
assertions of Republicans, who have effectively tossed climate science 
away. But the headline scientific conclusion, that a century and a half 
of instrumental measurements confirm a warming trend, is, well, all 
alittle 1990. 

Of course, reproduction of existing results is a valid contribution, 
and the statistical methods developed by the BEST team could be 
useful additions to climate science. But valid contributions and useful 
additions alone do not generate worldwide headlines, so the mas- 
sive publicity associated with the release of the papers (which were 
simultaneously submitted to the Journal of Geophysical Research) is 
a curious affair. 

There was predictable grumbling at the media coverage from within 
the scientific community, which saw it as publicity in lieu of peer 
review. Reporters are more than happy to cover the story now, while 
it’s sexy, but will they cover it later, when the results are confirmed, 
adjusted or corrected in accordance with a thorough vetting? The 
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short answer is no, many of them will not. Barring an extraordinary 
reversal of message, the wave of press coverage is likely to be only a 
ripple when the papers are finally published. And this is what upsets 
the purists: the communication of science in this case comes before 
the scientific process has run its course. 

Members of the Berkeley team revelled in their role as scientific 
renegades. Richard Muller, the physicist in charge, even told the BBC: 
“That is the way I practised science for decades; it was the way every- 
one practised it until some magazines — particularly Science and 
Nature — forbade it” 

This is both wrong and unhelpful. It is wrong because for years 
Nature has explicitly endorsed the use of preprint servers and confer- 
ences as important avenues for scientific discussion ahead of submis- 
sion to this journal, or other Nature titles. For example, on page 493 
this week we publish a paper that discusses the dwarf planet Eris, based 
on results that the lead author presented (with Nature’s knowledge and 
consent) at a conference several weeks ago. Journalists are, of course, 
welcome to report what they come across in such venues — as several 
did on Eris. What Nature discourages is authors specifically promoting 
their work to the media before a peer-reviewed paper is available for 
others in the field to read and evaluate. 

Muller's statement is unhelpful because such inflammatory claims 
can only fuel the heated but misguided debate on climate-sceptic blogs 
and elsewhere about the way science works and howit treats those who 
insist on viewing themselves as outsiders. 

To solicit input on results before publication is nota guerrilla action 
against a shadowy scientific elite. Witness the posting on a preprint 
server last month of the paper reporting neutrinos that apparently 
travel faster than light: the authors made it clear that they were seeking 
help from the wider community to explain the 
findings, and the media stories (if not the head- 
lines) mostly reflected that. To pretend other- 
wise can only erode public trust in science, as it 
is practised by all. m 
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WORLD VIEW .pernssicossen 


state must reduce greenhouse-gas emissions to 80% below what 

they were in 1990, by 2050. Similar targets have been adopted in 
Europe, but the California goal is well beyond any federal policy taken 
on in the United States. Is it possible? What will it take to achieve it? 
For two years, I was part of a group of energy experts in California 
that tried to answer those questions. Our report, California’s Energy 
Future — The View to 2050, was released by the California Council on 
Science and Technology earlier this year. 

For smaller emissions-reduction targets, tactical approaches such 
as piecemeal reductions may look promising. But to ensure a radical 
decrease in emissions while also reliably meeting its energy needs, 
California must make strategic choices. 

The difference may seem academic, but in fact it is hugely significant. 
For example, if your net emissions target is not 
near-zero, you might approach it by increas- 
ing the use of biofuels in cars. But biofuels are 
scarce. To achieve near-zero net emissions, you 
must electrify the cars so that you can reserve 
the biofuels for forms of transport that cannot be 
electrified — heavy-duty trucks and planes, for 
example. In a systems approach, using biofuels 
in cars is a dead end. 

Similarly, it is popular to promote extensive 
use of wind power, with no worries about what 
to do when the wind doesn't blow. Somehow the 
problem just gets ‘solved’. Some say that we can 
radically reduce emissions with only a major 
emphasis on efficiency, or just by changing our 
behaviour. But what if it doesn’t add up? 

In our report, the California’s Energy Future 
Committee looked at the big picture, asking which technical strate- 
gies will achieve an energy system with near-zero emissions yet still 
meet society's needs. We estimated how much more efficient buildings, 
industry and transportation could become, and how quickly cars, buses, 
trains and heat production could be electrified. We looked at how to 
supply that electricity from near-zero-emissions sources: nuclear power, 
fuel-based power plants used with carbon capture and storage technol- 
ogy, and renewable energy. We also worried about emissions from ‘load 
balancing’ in which generators are used to meet peak loads or fill in for 
intermittent power from solar or wind sources. We assessed how much 
biomass might be sustainably available to meet the remaining demand 
for fuel, and how much it could help to cut emissions. We counted eve- 
rything, but only once. It was hard, but it was honest. 

Having done the maths, what did we discover? 


E 2005, California threw down the gauntlet: by executive order, the 


If California could very quickly replace cars, DNATURE.COM 
appliances, boilers, buildings and power plants _ Discuss this article 
with today’s state-of-the art technology, replace __ online at: 

and expand current electricity generation with _ go.nature.com/rv9yuu 


GETTING ALL THE WAY 
DOWN T0 


80% CUTS 


WILL ALMOST 
CERTAINLY REQUIRE 


MAJOR 
ADVANCES. 


Piecemeal cuts won’t add 
up to radical reductions 


To meet ambitious emissions targets will require systems thinking and 
massive breakthroughs in technology and fuels, says Jane C. S. Long. 


non-emitting sources and produce as much biofuel as possible by 2050, 
the state could reduce emissions a lot — by perhaps 60% below 1990 
levels. But it would have to replace or retrofit every building to very 
high efficiency standards. Electricity would have to replace natural gas 
for home and commercial heating. All buses and trains, virtually all 
cars, and some trucks would be electric or hybrid. And the state’s entire 
electricity-generation capacity would have to be doubled, while simulta- 
neously being replaced with emissions-free generation. Low-emissions 
fuels would have to be made from California's waste biomass plus some 
fuel crops grown on marginal lands without irrigation or fertilizer. 

To reach an 80% cut will take new technology. 

Emissions-free electricity is one hurdle. California has plenty of 
renewable resources, but they are intermittent. Energy-storage tech- 
nology is not yet good enough to solve this problem, and no one knows 
whether smart-grid technologies can. Using 
natural-gas generators to firm up the supply will 
mean falling short of the 80% goal. 

A reliable reinvented energy system should 
provide base-load power without intermittency 
or emissions. California should exploit all the 
geothermal energy it can. Carbon-capture 
schemes should focus not on coal-fired plants, 
but on lower-cost natural-gas plants, which 
produce fewer emissions to sequester. And the 
state should rethink its opposition to nuclear 
power. 

Even if the electricity problem can be solved, 
it wort address the needs of planes, trucks, ships 
and some industrial heating that cannot be elec- 
trified. The state will still need fuel — about 
three-quarters as much as today. California 
would be lucky to get half of that from biofuels. 

So there we are — a concerted effort to deploy known technology 
could cut emissions by more than half, but getting all the way down to 
80% cuts will almost certainly require major advances in near-zero- 
emissions fuels. This is by far the biggest technology gap. The conclusion 
may seem obvious, but few have really given this the hardheaded look it 
deserves. California can't just spend or deploy its way to an 80% reduc- 
tion or beyond — and neither can anywhere else. 

We don't know precisely how economic and political factors will 
help or hinder progress towards the reduction target. But we are 
obliged to try to reach it, and we now know what it will take. This 
is not a small thing. We may not make the goal of radical emissions 
cuts by 2050, but it is important to get there eventually — or rather, 
as fast as we can. @ 


Jane C. S. Long is principal associate director at large at the Lawrence 
Livermore National Laboratory in Livermore, California, USA. 
e-mail: janecslong@gmail.com 


27 OCTOBER 2011 | VOL 478 | NATURE | 429 


© 2011 Macmillan Publishers Limited. All rights reserved 


SEVEN DAYS nesensi 


POLICY 


Science Europe 

A new Brussels-based lobby 
group, Science Europe, 

held its founding assembly 

in Berlin on 21 October. 

It hopes to become the 

“single voice for science in 
Europe’, its president, Paul 
Boyle, told Nature in August 
(see Nature 477, 18; 2011). 
The organization (www. 
scienceeurope.org) unites two 
science-advocacy groups: the 
European Heads of Research 
Councils, which has now been 
officially dissolved, and the 
European Science Foundation, 
which is continuing as a 
separate body but may wind 
down its activities over the 
next few years. 


No carbon capture 
The United Kingdom's energy 
and climate-change agency 
has ditched plans to invest 

£1 billion (US$1.6 billion) in 

a project to capture and bury 
millions of tonnes of carbon 
dioxide produced each year 
bya coal-fired power station 
in Longannet, Scotland. The 
project, which aimed to be even 
larger than the US flagship 
FutureGen programme, was 
once a front runner in Britain's 
much-delayed competition to 
receive funding for carbon- 
capture and storage schemes. 
The money will be spent on 
other carbon-capture projects, 
the agency said on 19 October. 
See go.nature.com/xqnctd for 
more. 


Pe RESEARCH 
A planet is born 


Astronomers have for the 

first time imaged a planet so 
young — amere 2 million 
years old — that it is still 
gathering material from its 
birth site, a disk of gas and dust 
surrounding a star 145 parsecs 
(473 light years) from Earth. 

It lies about as far from its star 


Earthquake strikes eastern Turkey 


Several hundred people are feared to have 
died in the magnitude-7.2 earthquake that 
hit eastern Turkey on 23 October. As Nature 
went to press, rescue teams were searching 
for survivors in cities such as Ercis and Van, 
near the Iranian border. The region, where 


as Uranus does from our Sun. 
The finding was published on 
the arXiv preprint server on 
17 October (A. L. Kraus and 
M. J. Ireland preprint at http:// 
arxiv.org/abs/1110.3808v1; 
2011), although it had 

been shared at conferences 
previously. See go.nature.com/ 
opal pd for more. 


Warming verified 
An independent analysis of 
the land-surface-temperature 
record has concluded — if 
anyone was in doubt — that 
global warming is happening. 
The Berkeley Earth Surface 
Temperature (BEST) study, led 
by Richard Muller, a physicist 
at the University of California, 
Berkeley, mostly agreed with 
results from three teams 

that had previously studied 
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data from land-temperature 
stations, although it used 
different statistical methods for 
the analysis. The BEST analysis 
was released on 20 October, but 
no part of it has yet been peer 
reviewed; the team is preparing 
to submit four papers to the 
Journal of Geophysical Research. 
See go.nature.com/kqiypu and 
page 428 for more. 


Malaria vaccine 

The world’s leading candidate 
for a malaria vaccine has 
claimed promising results 

ina phase III trial, from 
which the first findings were 
published on 18 October. 

But the RTS,S/ASO1 vaccine, 
funded by pharmaceutical 
company GlaxoSmithKline 
and the global PATH Malaria 
Vaccine Initiative, showed low 


the Arabian tectonic plate converges with 
Eurasia, is prone to devastating earthquakes: 

a magnitude-7.6 quake in 1999 killed 17,000 
people; and in 1976, a magnitude-7.3 quake 
struck just 70 kilometres away from this week's 
epicentre and killed several thousand. 


efficacy against severe forms of 
malaria, disappointing some 
experts. See page 439 for more. 


Hope for MS drug 


A monoclonal-antibody 
treatment for multiple 
sclerosis has seen positive 
results in a late-stage clinical 
trial. The results, presented 
on 22 October at a congress 
in Amsterdam, found that 
78% of patients treated with 
alemtuzumab remained 

free from relapse (a flare-up 
of inflammation) after two 
years, compared with 59% 
using one of the standard 
therapies, interferon B-1a. 
But evidence that the drug 
can actually reverse nerve 
damage was not statistically 
significant (unlike in earlier 
trials). Alemtuzumab is made 


A. ANTAKYALI/AP. 


2 by Genzyme, a US subsidiary 
5 of Paris-based drug maker 
Sanofi. See go.nature.com/ 
suucta for more. 


| BUSINESS 
Abbott splits up 


Abbott Laboratories is 
splitting into two companies: 
an as-yet-unnamed research- 
based pharmaceuticals firm, 
anda medical-products 
business covering everything 
from generic drugs to lab 
diagnostics. The drug-maker, 
based in Chicago, Illinois, 

is the world’s ninth-largest 

in terms of global revenues, 
bringing in US$35.2 billion 
in 2010. The separation, 
announced on 19 October, is 
widely seen as an attempt to 
attract health-care investors 
to a business free of branded 
drugs, products on which 
some analysts feel the firm is 
overly dependent. 


Floods in Thailand 


Northern suburbs in 
Thailand’s capital Bangkok 
were last week inundated 

by heavy floods (pictured), 
described as the country’s 
worst in halfa century. As 
Nature went to press, officials 
were hoping that floodwaters 
would retreat, and that 
defensive walls and drainage 
canals would save the centre of 
the low-lying city from severe 
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According to the European 
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half decade (see chart). 


TREND WATCH 


Commission’s 2011 scoreboard 
for industry spending on research 
and development (R&D), released 
on 18 October, the world’s top 
1,400 companies increased their 
R&D investment by 4% last year, 
after a 1.9% drop in 2009. Swiss 
drug firm Roche, based in Basel, 
spent €7.2 billion (US$10 billion) 
on research — the most of any 
firm. The rise broadly mirrors a 
similar upturn in sales. Research 
spending as a proportion of net 
sales has fallen slowly over the past 


damage. The floods, which 
have been going on since late 
July, have killed more than 
350 people and caused billions 
of dollars in damage. 


Europe GPS launch 
The European Space Agency 
has launched the first two 
operational spacecraft of 
Galileo, Europe's global 
positioning system. The 
satellites, launched on 

21 October, joined two test 
satellites already in orbit. The 
network, costing more than 
€5 billion (US$6.9 billion), will 
feature up to 27 operational 
satellites and three spares. See 
go.nature.com/jvcbdl 

for more. 


Stealing secrets 


A scientist who was born in 
China but is a permanent 
resident in the United States 
has pleaded guilty to economic 
espionage and stealing trade 


secrets from two former US 
employers to benefit Hunan 
Normal University in China. 
Kexue Huang admitted to 
passing on details about 
pesticides that he had learned 
when working at Dow 
AgroSciences in Indianapolis, 
Indiana, from 2003 to 2008. He 
also admitted to stealing a key 
component of a food product 
developed at his subsequent 
employer, grain distributor 
Cargill in Minneapolis, 
Minnesota, federal prosecutors 
said on 18 October. 


Journal chief 


Molecular biologist Inder 
Verma will be the next editor- 
in-chief of the Proceedings 

of the National Academy 

of Sciences, the journal 
announced on 19 October. 
Verma, currently at the Salk 
Institute in La Jolla, California, 
replaces Randy Schekman, 
editor since 2006, who is 
moving to edit a new open- 
access life-sciences journal 
(see Nature 475, 145; 2011). 


Head for NIH centre 
Cell biologist Chris Kaiser 

will be the next director of the 
basic-biosciences institute 

at the US National Institutes 
of Health in Bethesda, 
Maryland, the agency 
announced on 18 October. 
Kaiser, who currently heads 
the biology department at the 
Massachusetts Institute of 
Technology in Cambridge, will 


CORPORATE RESEARCH SPENDING RECOVERS 


Firms increased their spending on research and development (R&D) 
last year, although the recovery is largely in line with greater sales. 


fe Growth in R&D spending 
(year-on-year) 


— R&D spending as a 
fraction of net sales 
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31 OCTOBER 

The United Nations 
proclaims that the world 
population has reached 

7 billion — see Nature 
478, 300 (2011) for more. 


1 NOVEMBER 
China’s Shenzhou 8 
spacecraft is rumoured 
to launch. It is the 
country’s first attempt 
to remotely docka craft 
with its Tiangong 1 
space module, which 
launched last month. 


2-4 NOVEMBER 
Progress in using stem 
cells to regrow livers, 
lungs, kidneys and spinal 
cords is discussed at the 
World Conference on 
Regenerative Medicine 
in Leipzig, Germany. 
go.nature.com/jsdbdy 


lead the US$2-billion National 
Institute of General Medical 
Sciences from spring 2012. He 
replaces acting director Judith 
Greenberg, who has been 
covering the role since former 
director Jeremy Berg stepped 
down in July. See go.nature. 
com/a2zmbi for an interview 
with Kaiser. 


UK physics head 
Britain's most financially 
troubled science funding 
agency, the Science and 
Technology Facilities Council, 
has a new chief executive: 

John Womersley, who 

was its director of science 
programmes. He replaces Keith 
Mason, who was criticized for 
poor community engagement 
during years of dire financial 
straits for the council, which 
was founded by the merger of 
two councils in 2007 and funds 
mainly physics and astronomy 
research (see Nature 462, 396; 
2009). The appointment was 
announced on 18 October. 


> NATURE.COM 
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source of contaminated 
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groundwater p.437 


GLOBAL HEALTH Researchers 
raise questions over malaria 
vaccine p.439 
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shortage of children’s 
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The Fukushima accident led to mass evacuations from nearby towns such as Minamisoma. 


NUCLEAR DISASTER 


Fallout forensics 
hike radiation toll 


Global data on Fukushima challenge Japanese estimates. 


BY GEOFF BRUMFIEL 


he disaster at the Fukushima Daiichi 
Tinnae plant in March released far more 

radiation than the Japanese government 
has claimed. So concludes a study’ that com- 
bines radioactivity data from across the globe 
to estimate the scale and fate of emissions from 
the shattered plant. 

The study also suggests that, contrary to 
government claims, pools used to store spent 
nuclear fuel played a significant part in the 
release of the long-lived environmental con- 
taminant caesium- 137, which could have been 
prevented by prompt action. The analysis has 
been posted online for open peer review by the 
journal Atmospheric Chemistry and Physics. 

Andreas Stohl, an atmospheric scientist 
with the Norwegian Institute for Air Research 
in Kjeller, who led the research, believes that 


the analysis is the most comprehensive effort 
yet to understand how much radiation was 
released from Fukushima Daiichi. “It’s a very 
valuable contribution,’ says Lars-Erik De Geer, 
an atmospheric modeller with the Swedish 
Defense Research Agency in Stockholm, who 
was not involved with the study. 

The reconstruction relies on data from doz- 
ens of radiation monitoring stations in Japan 
and around the world. Many are part of a global 
network to watch for tests of nuclear weapons 
that is run by the Comprehensive Nuclear- 
Test-Ban Treaty Organization in Vienna. The 
scientists added data from independent sta- 
tions in Canada, Japan and Europe, and then 
combined those with large European and 
American caches of global meteorological data. 

Stohl cautions that the resulting model is 
far from perfect. Measurements were scarce 
in the immediate aftermath of the Fukushima 


accident, and some monitoring posts were too 
contaminated by radioactivity to provide reliable 
data. More importantly, exactly what hap- 
pened inside the reactors — a crucial part of 
understanding what they emitted — remains a 
mystery that may never be solved. “If you look 
at the estimates for Chernobyl, you still have 
a large uncertainty 25 years later,’ says Stohl. 

Nevertheless, the study provides a 
sweeping view of the accident. “They really 
took a global view and used all the data 
available,” says De Geer. 


CHALLENGING NUMBERS 

Japanese investigators had already developed 
a detailed timeline of events following the 
11 March earthquake that precipitated the 
disaster. Hours after the quake rocked the six 
reactors at Fukushima Daiichi, the tsunami 
arrived, knocking out crucial diesel back-up 
generators designed to cool the reactors in an 
emergency. Within days, the three reactors 
operating at the time of the accident over- 
heated and released hydrogen gas, leading to 
massive explosions. Radioactive fuel recently 
removed from a fourth reactor was being held 
in a storage pool at the time of the quake, and 
on 14 March the pool overheated, possibly 
sparking fires in the building over the next 
few days. 

But accounting for the radiation that came 
from the plants has proved much harder 
than reconstructing this chain of events. The 
latest report from the Japanese government, 
published in June, says that the plant released 
1.5 x 10'°bequerels of caesium-137, an isotope 
with a 30-year half-life that is responsible for 
most of the long-term contamination from 
the plant’. A far larger amount of xenon-133, 
1.1x 10” Bq, was released, according to offi- 
cial government estimates. 

The new study challenges those numbers. On 
the basis of its reconstructions, the team claims 
that the accident released around 1.7 x 10" Bq 
of xenon- 133, greater than the estimated total 
radioactive release of 1.4x 10'°Bq from Cher- 
nobyl. The fact that three 
reactors exploded in 
the Fukushima accident 
accounts for the huge 
xenon tally, says De Geer. 

Xenon-133 does not 
pose serious health 
risks because it is not 
absorbed by the body > 
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By | Focus 
RADIOISOTOPE RECONSTRUCTIO 


Reactors exploded between 12 
and 15 March, but radioactivity 
may already have been leaking 
out before the blasts. 


> or the environment. Caesium-137 fallout, 
however, is a much greater concern because 
it will linger in the environment for decades. 
The new model shows that Fukushima released 
3.5 x 10'° Bq caesium-137, roughly twice 
the official government figure, and half the 
release from Chernobyl. The higher number is 
obviously worrying, says De Geer, although 
ongoing ground surveys are the only way to 
truly establish the public-health risk. 

Stohl believes that the discrepancy between 
the team’s results and those of the Japanese gov- 
ernment can be partly explained by the larger 
data set used. Japanese estimates rely primarily 
on data from monitoring posts inside Japan’, 
which never recorded the large quantities of 
radioactivity that blew out over the Pacific 
Ocean, and eventually reached North America 
and Europe. “Taking account of the radiation 
that has drifted out to the Pacific is essential for 
getting a real picture of the size and character of 
the accident,” says Tomoya Yamauchi, a radia- 
tion physicist at Kobe University who has been 
measuring radioisotope contamination in soil 
around Fukushima. 

Stohl adds that he is sympathetic to the 
Japanese teams responsible for the official 
estimate. “They wanted to get something out 
quickly,” he says. The differences between 
the two studies may seem large, notes Yukio 
Hayakawa, a volcanologist at Gunma Uni- 
versity who has also modelled the accident, 
but uncertainties in the models mean that the 
estimates are actually quite similar. 

The new analysis also claims that the spent 
fuel being stored in the unit 4 pool emitted copi- 
ous quantities of caesium-137. Japanese officials 
have maintained that virtually no radioactivity 
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From 11 to 14 March, winds 
blew most of the radioactivity 
over the Pacific Ocean. 


inland over Tokyo. 


leaked from the pool. Yet Stohl’s model clearly 
shows that dousing the pool with water caused 
the plant’s caesium-137 emissions to drop 
markedly (see ‘Radiation crisis’). The finding 
implies that much of the fallout could have been 
prevented by flooding the pool earlier. 

The Japanese authorities continue to maintain 
that the spent fuel was not a significant source 
of contamination, because the pool itself did 
not seem to suffer major damage. “I think 
the release from unit 4 is not important,’ says 
Masamichi Chino, a scientist with the Japa- 
nese Atomic Energy Authority in Ibaraki, who 
helped to develop the Japanese official estimate. 
But De Geer says the new analysis implicating 
the fuel pool “looks convincing”. 

The latest analysis also presents evidence 
that xenon-133 began to vent from Fuku- 
shima Daiichi immediately after the quake, 
and before the tsunami swamped the area. 


RADIATION CRISIS 


Modelling the first week of the Fukushima disaster 
reveals that huge bursts of radioisotopes poured 
from reactors and a spent-fuel storage pond. 


Unit 3 explodes 


Unit 4 
spent fuel 
pool doused 
with water 


Estimated rate of caesium-137 
emission (GBq s* log scale) 


10 March 17 March 24 March 


On 15 March, a change 
in weather brought the 
radioisotopes back 


After a massive earthquake and tsunami hit Japan on 11 March, three reactors at Fukushima Daiichi blew up and a 
fourth caught fire. A reconstruction now shows how radioisotopes streamed from the plant and swept across the country. 


Precipitation along Japan's 
central mountain ridge 
then created a line of 
contamination seen by 
aerial surveys. 


Fukushima 


This implies that even without the devastating 
flood, the earthquake alone was sufficient to 
cause damage at the plant. 

The Japanese government’s report has 
already acknowledged that the shaking at 
Fukushima Daiichi exceeded the plant's design 
specifications. Anti-nuclear activists have long 
been concerned that the government has 
failed to adequately address geological hazards 
when licensing nuclear plants (see Nature 448, 
392-393; 2007), and the whiff of xenon could 
prompt a major rethink of reactor safety assess- 
ments, says Yamauchi. 

The model also shows that the accident 
could easily have had a much more devastating 
impact on the people of Tokyo. In the first days 
after the accident the wind was blowing out to 
sea, but on the afternoon of 14 March it turned 
back towards shore, bringing clouds of radio- 
active caesium-137 over a huge swathe of the 
country (see ‘Radioisotope reconstructior). 
Where precipitation fell, along the country’s 
central mountain ranges and to the north- 
west of the plant, higher levels of radioactivity 
were later recorded in the soil; thankfully, the 
capital and other densely populated areas had 
dry weather. “There was a period when quite 
a high concentration went over Tokyo, but 
it didnt rain,” says Stohl. “It could have been 
much worse.’ 


Additional reporting by David Cyranoski and 
Rina Nozawa. 


1. Stohl, A. et al. Atmos. Chem. Phys. Discuss. 11, 
28319-28394 (2011). 

2. www.kantei.go.jp/foreign/kan/topics/201106/ 
iaea_houkokusho_e.html 

3. Chino, M. et al. J. Nucl. Sci. Technol. 48, 1129-1134 
(2011). 
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Conflicting studies fuel 
arsenic debate 


Uncertainties over contaminated groundwater in southern Asia highlight gaps in science. 


BY EUGENIE SAMUEL REICH 


in human history. Yet more than a decade 

after the discovery that drinking-water 
wells in Bangladesh are contaminated with 
naturally occurring arsenic, researchers are 
still struggling to understand the hydrol- 
ogy and chemistry behind an environmental 
crisis that affects more than 60 million people 
in the Bengal Basin. 

A paper’ in the press in Geophysical Research 
Letters adds a further piece to the puzzle. It 
concludes that one factor, man-made ponds, 
probably does not play a major part in the 
release of arsenic into groundwater as some 
geologists have claimed. Yet the mechanism 
that is responsible remains elusive, in part 
because of a lack of systematic studies of the 
region affected by the problem. 

Groundwater in sites across south and east 
Asia contains arsenic at levels that can cause 
chronic poisoning. Symptoms include skin 
lesions, respiratory and cardiovascular dis- 
ease and a range of cancers. From the basins 
of the Indus River in the west to the Yangtze 
in China, “almost all the major rivers draining 
the Himalayas are affected”, says Scott Fend- 
orf of Stanford University in California, a bio- 
geochemist who has studied the problem in 
Cambodia. Residual tectonic activity from the 
formation of the Himalayas and high rainfall in 
the region combine to produce rapid erosion, 
which sweeps arsenic — associated with iron 
pyrite ores — along with iron oxide particles 
into river sediment below. 

The public-health consequences are most 
stark in the Ganges delta of Bangladesh, where, 
in the 1970s, aid agencies encouraged the drill- 
ing of hand-pumped wells so that people did 
not have to drink surface water contaminated 
with waterborne microbial diseases such as 
cholera. Geologists failed to realize that the 
pumps would be tapping into arsenic-laced 
aquifers beneath. 

But just how the arsenic gets into the ground- 
water is not known. In 2008, a group led by 
hydrologist Charles Harvey at the Massachu- 
setts Institute of Technology in Cambridge 
reported’ that water from an aquifer in the 
Munshiganj district of Bangladesh has the same 
isotopic fingerprint as pond water there, sug- 
gesting that organic material may be seeping 


I: is often called the largest mass poisoning 


High levels of arsenic in drinking water threaten the health of millions of people living in the Bengal Basin. 


from ponds into the aquifer 30 metres below. 
Harvey proposed that the arsenic is released 
into the water supply when organic material 
from ponds triggers microbial activity, which 
can dissolve the iron oxide particles and release 
their load of arsenic into the water supply. 

The man-made ponds in Bangladesh, a by- 
product of excavating earth to build up the 
land against floodwater caused by the mon- 
soon rains, cover a 22-fold greater area than 

they did 50 years 


“Almostallthe 28° Making them 
major rivers a plausible suspect. 
dentine the Harvey and his col- 
ames — leagues suggest that 

ted.” contamination might 
affected. be reduced if ponds 


were not dug near 
wells and if wells were not drilled near exist- 
ing ponds. 

But the finding proved controversial after a 
group led by geochemist John McArthur at Uni- 
versity College London — who failed to see the 
effect while studying arsenic contamination in 
West Bengal, India, in 2004 — criticized some 
of Harvey’s team’s methods*. McArthur and his 
colleagues reanalysed Harvey’s data and arrived 
at the opposite conclusion. They argued that 
arsenic was being produced in sediments near 


the aquifer itself by long-buried organic matter, 
and that water from ponds was actually flush- 
ing out the aquifer and reducing contamination. 

The latest study’ also suggests that the ponds 
are not to blame. In their paper, Karen Johan- 
nesson of Tulane University in New Orleans, 
Louisiana, and her colleagues report that six 
arsenic-laden sites west of the Ganges in West 
Bengal fail to show signs of organic matter 
from ponds. The researchers also argue that 
organic matter from ponds would take thou- 
sands of years to penetrate through the tens of 
metres of clay and sand to the aquifers below, 
far longer than Bangladeshis have been dig- 
ging ponds. “I think we feel pretty confident, 
at least where we're working, that ponds are not 
contributing, Johannesson says. 

But Harvey’s and Johannesson’s groups 
worked at widely separated sites, where con- 
ditions may differ (see map), says hydrolo- 
gist Roger Beckie of the University of British 
Columbia in Vancouver. He suspects that some 
organic matter is coming from ponds whereas 
some is intrinsic to the sediments. “My gut feel- 
ing is that both processes are at work,” he says. 

Hydrologist Abhijit Mukherjee at the Indian 
Institute of Technology in Kharagpur, who is 
working with West Bengal’s government to 
find ways of predicting where safe wells > 
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> might be dug, says the dispute 
shows that government and funding 
agencies need to support studies at 
multiple sites, from the Himalayas to 
the Bay of Bengal. “If that is done we 
have a systematic way of comparing 
data from sites,” he says. 

Whatever the origin of the arsenic, 
says Stephan Hug, a geochemist at 
the Swiss Federal Institute of Aquatic 
Science and Technology in Diiben- 
dorf, one possible solution would be 
to drill deep wells — to below around 
150 metres —where the water should 
be arsenic free. But then if too many 
such wells are drilled, the arsenic could 


TOXIC WATERS 


Arsenic transported from the Himalayas by the Ganges and other rivers 
has found its way into groundwater in the Bengal Basin, but three studies 
| using data from three Separate locations (inset) ganllsd on how. 
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be pulled down to that level, he adds. 
Moreover, water from such depths can 
be unacceptably saline. 

Mukherjee says that the main goal 
should be to learn how the arsenic 
is getting into the groundwater, so 
that researchers can make wells safe 
by ensuring that such conditions are 
avoided. The drinking water of millions 
of people depends on it. = 


1. Datta, S. et al. Geophys. Res. Lett. http:// 
dx.doi.org/10.1029/2011GL049301 
(2011). 

2. Neumann, R. B. et al. Nature Geosci. 3, 

46-52 (2010). 
. McArthur, J. M., Ravenscroft, P. & Sracek, O. 
Nature Geosci. 4, 655-656 (2011). 
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US bill targets 
grantee salaries 


Cost-saving measure would reduce maximum amount paid 
to biomedical researchers funded by federal agencies. 


BY MEREDITH WADMAN 


he US House of Representatives is 
| considering legislation that would roll 
back the maximum amount ofa grant 
that can go towards the salary of a biomedical 
researcher funded by a federal agency. Under 
a spending bill proposed on 29 September, the 
‘salary cap’ would return to what it was roughly 
ten years ago. If Congress passes the measure, 
universities and other institutions will face 
a dilemma: dip into other funds to make up 
the difference and keep leading investigators 
happy, or risk losing researchers to industry, 
clinical practice or competing institutions. 

The 2012 spending bill would cut the salary 
cap by 17%, from US$199,700 to $165,300, for 
extramural scientists funded by the National 
Institutes of Health (NIH; see ‘Lowering the 
ceiling’), the Centers for Disease Control 
and Prevention and other agencies in the 
Department of Health and Human Services 
(DHHS). A parallel Senate bill leaves the 
salary cap untouched; the two versions must 
be harmonized before a final 2012 DHHS 
spending bill is passed. The measures will be 
considered as part of fraught budget negotia- 
tions for financial year 2012. 

Just how many scientists would be affected is 
not clear, but the cut would certainly hit many 
of the thousands of principal investigators who 
receive grants from the NIH. 


LOWERING THE CEILING 


The salary cap for US National Institutes of Health 
extramural researchers could fall to 2002 levels. 
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Among medical schools and teaching 
hospitals, “there is very strong concern about 
this proposal’, says David Moore, senior direc- 
tor for governmental relations at the Associa- 
tion of American Medical Colleges (AAMC) 
in Washington DC. “If you are a research- 
intensive institution and you're talking about 
hundreds of faculty where you have to make up 
this difference, we're talking about millions of 
dollars the institution is now responsible for” 

On 11 October, the AAMC co-authored 
a letter to congressional-spending lead- 
ers including Denny Rehberg (Republican, 
Montana), who authored the House bill and 
is chairman of the subcommittee that funds 
the DHHS. Signed by 111 organizations and 
institutions, the letter argues that lowering the 
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salary cap would be most damaging to those 
investigators who dedicate the most time to 
research and therefore are most reliant on 
grants; would discourage gifted young scien- 
tists from entering research; and would drive 
physicians out of research even as demands for 
cures and therapies are growing. 

“Physicians have alternative ways of surviv- 
ing,’ notes Moore. “If you continue to put more 
and more disincentives into a research career, 
at what point do those individuals say: ‘’m 
going to go off and see patients?” 

Rex Chisholm, vice-dean for scientific 
affairs and graduate education at Northwest- 
ern University’s Feinberg School of Medicine 
in Chicago, Illinois, says that trying to make 
up for the cut “would create real difficulties for 
Northwestern and most other medical schools. 
Most of us don’t have that kind of money just 
lying around.” 

He adds that in 2010, Feinberg, which is 
a signatory to the letter to Congress, paid 
nearly $8.3 million towards the salaries of its 
235 faculty members who earned more than 
the $199,700 cap. Had the cap been $165,300, 
he says, the medical school would have had to 
pay almost $12 million to 283 faculty members. 

Rehberg did not respond to Nature’s 
requests for comment. 

If the cut goes through, says Jeff Gerber, a 
physician-researcher at the Children’s Hospital 
of Philadelphia in Pennsylvania, “you're either 
going to have fewer [academic] positions avail- 
able because the institutions will only be able 
to pay so many people, or, I suppose, money 
sometimes drives people into different profes- 
sional choices”. Gerber says that his love for 
research will probably prevent him ever leaving 
it. But he expects hospitals to ask physicians to 
do less research and more income-generating 
clinical care if grant salaries are cut. 

Ironically, the House bill that mandates the 
cut is far more generous to the NIH overall 
than is its Senate counterpart. The House bill 
would boost the biomedical agency’s budget by 
$1 billion, or 3.3%, to $31.7 billion. The Senate 
bill would cut it by $190 million. = 


SOURCE: NIH 
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The RTS,S/ASO1 candidate vaccine offers poor protection against severe malaria. 


GLOBAL HEALTH 


Malaria vaccine 
results face scrutiny 


Experts question early release of incomplete trial data. 


BY DECLAN BUTLER 


CC alaria vaccine could save millions 

Me children’s lives”; “World’s first 

malaria vaccine works in major 

trial”; “Malaria vaccine almost here”. To judge 

from last week’s headlines, scientists had made 

a big breakthrough in the long campaign to 

create a malaria vaccine, proving its effective- 

ness with interim results from a huge phase II 
clinical trial in Africa’. 

Yet several leading vaccine researchers, who 
are critical of the unusual decision to publish 
partial trial data, argue that the results raise 
questions about whether the RTS,S/ASO1 
candidate vaccine can actually win approval. 

RTS,S has been in development for some 
25 years, initially by the US military, and since 
2001 by a public-private venture between the 
PATH Malaria Vaccine Initiative (MVI) and 
the drug-maker GlaxoSmithKline (GSK), sup- 
ported by US$200 million in funding from the 
Bill & Melinda Gates Foundation. Bill Gates 
himself announced the interim results at the 
Gates Malaria Forum in Seattle, Washington. 

Gates’ speech and the MVI’s public-relations 
material were suitably circumspect about the 
results, but they were “immediately translated 
into headlines about [reductions] in death and 
mortality’, says Andrew Farlow, an economist 
at the University of Oxford, UK, who has previ- 
ously assessed the RTS,S programme’. “But the 
data are not telling you that at all.” 

Some researchers question whether the 
results should have been published before all 


the data were available; full results are expected 
in 2014. Interim trial data are usually reported 
only to regulatory authorities, and clinical trials 
published only once all the data are in, noted 
Nicholas White, a malaria expert at Mahidol 
University in Bangkok, in an editorial’ accom- 
panying the interim results. “There does not 
seem to bea clear scientific reason why this trial 
has been reported with less than half the efficacy 
results available,” he wrote. 

The publication presents vaccine-efficacy 
data for infants aged 5-17 months, but not for 
those aged 6-12 weeks, who are the stated target 
of the trial: it is this group that would receive the 
malaria vaccine alongside routine immuniza- 
tions. The aim of the trial is to provide the World 
Health Organization (WHO) with the informa- 
tion it needs to consider licensing the vaccine, 
and recommend it for use in that age group. 
“What is the point of publishing the interim 
data on the 5-17-month-olds?” asks Stephen 
Hoffman, a veteran malaria researcher and chief 
executive ofa rival vaccine effort, Sanaria, based 
in Rockville, Maryland. 

The MVI°s director, Christian Loucq, argues 
that the results were “robust enough to be 
published. We decided this before we knew 
the results; we felt it was our scientific and 
ethical duty to make the results public when 


they become available.” 
One of the biggest claims made in the paper 
is that RTS,S reduced the 
> NATURE.COM total number of episodes 
Vaccines special: of clinical malaria in the 
nature.com/vaccines older group by 55.1%, > 
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> compared to controls. This measure of 
efficacy is recommended for assessing a 
partially effective vaccine’. But the public 
expects vaccine efficacy to describe pro- 
tection over a period of time, argues Judith 
Epstein, a captain and paediatrician at the 
US Military Malaria Vaccine Program in 
Silver Spring, Maryland. Recalculating the 
trial data shows that RTS,S protected just 
35-36% after 12 months, she says, add- 
ing that the paper should have presented 
both numbers. The study also showed no 
detectable impact on mortality, and it is too 
early to tell whether RTS,S actually protects 
against malaria, or merely delays infection. 

The paper did report that RTS,S reduced 
severe malaria by 47% in the older group. 
But combining that result with available data 
from the younger age group cut that number 
to 34.8% — meaning that for the youngest 
children, the benefit must be even smaller. 
“The real question mark is the 34.8% efficacy 
in severe disease,’ says Blaise Genton of the 
Swiss Tropical and Public Health Institute 
in Basel, and a member of the WHO tech- 
nical advisory group for RTS,S. The results 
suggest that the vaccine might fall short of 
expectations, laid out in 2006 by a WHO-led 
consortium’, that it should have a “protec- 
tive efficacy of more than 50% against severe 
disease and death and lasts longer than one 
year”. “If it doesn’t reduce deaths, and has 
only a modest effect on severe malaria, 
these are going to be big questions for deci- 
sion-makers at WHO, GSK and the Gates 
Foundation,’ says Hoffman. 

Another worrying finding is that the 
frequency of serious adverse events, such 
as convulsions and meningitis, was sig- 
nificantly higher in the vaccinated group, 
although the data are too preliminary to 
draw firm conclusions. “The severe disease 
findings are a concern,’ says Genton. 

But Hoffman, like many researchers 
contacted by Nature, says that RTS,S still 
marks a significant achievement. It is the 
first vaccine against a complex multicellular 
parasite, Plasmodium falciparum, to consist- 
ently show a significant protective effect in 
large-scale trials. The phase III trial of RTS,S 
resulted in groundbreaking cooperation 
with African scientists, who led the 11 trials 
in 7 countries, says Hoffman. “I think that 
those teams deserve an incredible amount of 
recognition and congratulation.” = 
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Fetal gene screening 
comes to market 


Non-invasive procedure could make prenatal testing easier, 
but it comes with ethical problems. 


BY ERIKA CHECK HAYDEN 


ntil last week, scrutinizing a fetuss DNA 
| for indications of genetic abnormali- 
ties meant tapping into the mother’s 
womb with a needle. Now there's a test that 
can do it using a small sample of the mother’s 
blood. MaterniT21, a Down’s syndrome test 
that Sequenom of San Diego, California, 
launched in major centres across the United 
States on 17 October, is the first of several such 
tests expected on the market in the next year. 
It signals the arrival of a long-anticipated era of 
non-invasive prenatal genetic screening, with 
its attendant benefits and ethical complications 
(see Nature 469, 289-291; 2011). 

With the technology in place to sequence 
the fetal DNA carried in a pregnant woman's 
bloodstream, geneticists predict the list of con- 
ditions that can be detected by non-invasive 
means will grow rapidly. Another company, 
Gene Security Network of Redwood City, Cali- 
fornia, says its forthcoming test will also check 
for other genetic abnormalities, and Sequenom 
is studying the feasibility of expanding its test. 

“There's every reason to think that in the 
future you'll be able to extract an enormous 
amount of information from that sequencing 
data,” says Peter Benn, director of the Diagnostic 
Human Genetics Laboratories at the University 
of Connecticut Health Center in Farmington. 

Sequenom’s test sequences 36-base-pair 
fragments of DNA to identify sections from 
chromosome 21. Normally, the chromosome 
contributes 1.35% of the total maternal and 
fetal DNA in the mother’s blood. An overabun- 
dance of this material indicates the genetic 
abnormality that marks Down's syndrome. 

Sequenom is marketing its test as an add-on 
to current screening methods, which estimate 
the chance that a woman is carrying a fetus 
with Down's syndrome from ultrasound results 
and protein markers in the blood. Such non- 
genetic screening can detect 90-95% of Down's 
syndrome cases, but falsely indicates that up 
to 5% of women are carrying a baby affected 
by the condition. Sequenom’s test could be 
taken after a positive screening result to help a 
woman decide whether to undergo amniocen- 
tesis, a test that extracts amniotic fluid with a 
needle and carries a small risk of miscarriage. 
A study published this month, and paid for by 
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Sequenom, found that the company’s test has 
a false positive rate of 0.2% (G. E. Palomaki 
et al. Genet. Med. http://dx.doi.org/10.1097/ 
GIM.0b013e3182368a0e; 2011). 

It could spare some women from having 
amniocentesis after a false-positive screening 
result. But Benn says that the test will also pose 
difficulties. For instance, because it would take 
8-10 days to get the results of Sequenom test, if 
a woman did still opt for amniocentesis, and the 
result confirms that the baby has Down's syn- 
drome, there would be little time left to decide 
whether to terminate the pregnancy. And some 
women who test positive on MaterniT21 will 
probably choose to terminate pregnancies 
immediately rather than have amniocentesis. 

“Inserting this new test in the way that 
Sequenom is proposing is very difficult, from 
the patient perspective, and difficult for physi- 
cians and counsellors to manage,’ Benn says. 

Ethicists also caution that using such easy 

screening methods ever 


“Inthefuture earlier in pregnancy 
you'll be able might worsen the gender 
to extractan imbalance seen in coun- 
enormous tries such as China and 
amount of India. And if it becomes 
information routine to check for many 
from that different kinds of genetic 
sequencing abnormalities, ethicists 
data.” predict that more couples 


may face the quandary of 
whether to carry an ‘unhealthy fetus to term. 

“The idea that couples have choices about 
whether to continue their pregnancies may 
become strained because parents may be seen 
as irresponsible for allowing ‘defective’ preg- 
nancies to go to term,’ says Mildred Cho, an 
ethicist at Stanford University in Palo Alto, 
California. Other ethicists worry that fears of 
eugenics will be raised if testing can be done 
for less-serious conditions. 

Sequenom is solely focused on developing 
tests for conditions that are already part of 
prenatal screening programmes, says Mathias 
Ehrich, the company’s senior director for 
research and development diagnostics. “We 
do not want to invent new applications. Our 
focus is on making existing clinical applica- 
tions safer,’ he says. “I don't think that we are 
in a position to say that we should determine 
what hair colour the baby has.” m 


REGENERATIVE MEDICINE 


IN FOCUS | NEWS 


Furopean ban on stem-cell 
patents has a silver lining 


Researchers can work without fear of action over patent infringement. 


BY EWEN CALLAWAY 


o hear European stem-cell researchers 
Tit last week, you might have thought 

that their world was ending. After the 
European Court of Justice ruled on 18 Octo- 
ber that procedures involving human embry- 
onic stem (ES) cells cannot be patented, many 
responded with shock and dismay. 

“This is the worst possible outcome and it’s a 
disaster for Europe,’ Oliver Briistle at the Uni- 
versity of Bonn, Germany, told Nature shortly 
after learning that the court had felled his 1999 
patent for a method of transforming human 
ES cells into neurons. Others said that with- 
out patent protection, few investors would pay 
to develop stem-cell therapies for conditions 
from neurodegenerative diseases to diabetes. 

But in the days following the ruling, 
lawyers, funders and researchers have taken a 
more moderate view. There are other ways for 
companies and scientists who commercialize 
ES cells to protect their inventions in Europe, 
they say. And some believe that a lack of pat- 
ents could speed up, rather than suffocate, 
innovation. “Ifanything the ruling is an oppor- 
tunity,” says physician scientist Chris Mason of 
University College London. “It’s not the end of 
stem cells in Europe” 

The decision by the European Court of 
Justice, which applies throughout the Euro- 
pean Union and cannot be appealed, stems 
from a 2004 lawsuit brought by Greenpeace. 
The Amsterdam-based environmental group 
challenged Briistle’s patent on the grounds 
that it offended public sentiment and violated 
European law banning the industrial use of 
human embryos. A German court agreed, and 
by 2009 Briistle’s appeal had reached Europe's 
highest court (see Nature 462, 265; 2009). The 
language in these legal rulings — that commer- 
cial use of human embryos “would be contrary 
to ethics and public policy’, for example — 
alarmed scientists, who spoke out against the 
court (A. Smith et al. Nature 472, 418; 2011). 

The 13 judges of the court's Grand Chamber 
have now concluded that procedures involv- 
ing human ES cells cannot be patented if they 
derive from the destruction of embryos. The 
ban applies retrospectively, and contrasts 
sharply with the position in the United States, 
where scientists face few restrictions on patents 
relating to ES-cell applications. 


Embryonic stem cells: contrary to ethics and 
public policy? 


“Time will tell how serious it’s going to 
be,” says Nick Bassil, an intellectual-property 
lawyer at Kilburn & Strode in London, who 
represents companies developing stem-cell 
therapies. He adds that it may take years for the 
European Patent Office, national patent offices 
and courts to interpret the ruling. 

However, even a restrictive interpretation 
should allow companies to patent the tech- 
nologies needed to turn human ES cells into 
treatments, rather than patenting procedures 
involving the cells themselves. “If the sum total 
of this market were some cell lines, I would be 
deeply, deeply worried,” says Julian Hitchcock, 
a life-sciences lawyer at Field Fisher Water- 
house in London. Growth media, equipment 
and chemicals that help scientists to work 
with stem cells could all be patented in Europe 
without running afoul of the high court's rul- 
ing, he says. For instance, Peter Coffey at the 
Institute of Ophthalmology in London and his 
team are working with the drug giant Pfizer to 
develop a human-ES-cell-based treatment for 
macular degeneration, a progressive disease of 
the retina that causes blindness. Their patents 
cover the placement of their retinal cells in the 
eye, not the cells themselves. 

Rob Buckle, a programme manager at Brit- 
ain’s Medical Research Council (MRC) in Lon- 
don, agrees that investors will find other ways 
to protect their intellectual property, and adds 

that the ruling will not 
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also help to ward off copycats who might 
otherwise exploit the lack of patent protection 
to rush their own versions of a treatment to 
market. By keeping many of their manufac- 
turing processes secret until they seek regu- 
latory approval, companies can ensure that 
knock-offs are unlikely, says Mason. “If1 give 
you my cell line, your chance of knowing what 
to do with it and copying what I do is zero,” 
he says. 

Many of the 20-year patents issued for ES- 
cell treatments will probably have expired by 
the time the treatments reach the clinic any- 
way, Mason adds. Indeed, the European Medi- 
cines Agency offers additional protection for 
inventions. The drug regulator keeps private 
for eight years any data that companies submit 
with their application for marketing approval, 
and blocks others from using this information 
for another two. 

The ruling may even turn out to be a boon 
for European stem-cell science, says Mason, 
creating an anything-goes atmosphere that 
could attract scientists from abroad. Non-com- 
mercial research is generally exempted from 
patent infringement claims, but many patents 
cover the cells’ use as research tools, creating 
uncertainty about which methods researchers 
are allowed to use, says Hitchcock. 

A January statement from the Hinxton 
Group, an influential consortium of scientists 
and ethicists, had expressed concern that stem- 
cell biology was becoming so thick with broad 
patents that key areas of the field were being 
walled off from scientists and entrepreneurs. 
“With patents gone, it’s much easier to do any- 
thing,” says Mason. m 


Additional reporting by Alison Abbott. 


CORRECTION 

The News story ‘Angry words over East 
Asian seas’ (Nature 478, 293-294; 2011) 
wrongly implied that Climatic Change took 
a defined stance on the position of China’s 
border in the South China Sea. In fact, 
co-editor Michael Oppenheimer merely 
told the authors of a paper containing a 
contested map that the journal would make 
space for any amendments to the map that 
they may deem appropriate. 
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Brain child 


Asking parents to donate a child’s brain to research is emotionally 
fraught. Some researchers say that it is time to put aside the taboos. 


BY ALISON ABBOTT 


He thought that studying post-mortem brains under the micro- 

scope would help him to work out why children with autism 
often have abnormalities in the key structures that drive emotion and 
behaviour. But he soon found that existing brain banks couldn't give him 
what he needed. “It’s just too hard to get high-quality tissue,” he says. 
The banks may contain hundreds or even thousands of brains — but not 
from children, and not necessarily in the best condition. 

Amaral, who is director of research at the MIND (Medical 
Investigation of Neurodevelopmental Disorders) Institute at the Uni- 
versity of California, Davis, is not the only scientist eager for access 
to brains from children. The crucial stages of brain development span 
early fetal life through to the end of the teenage years; and destructive 
neurodevelopmental disorders such as autism and schizophrenia are 
thought to arise partly because of faulty connections laid down during 
this time. Many researchers want to apply new technologies, including 
increasingly sensitive molecular analyses and ever smarter microscopy, 
to developing brains to create a dynamic picture of what goes wrong. 

When they succeed, the results can be breathtaking, says neuro- 
pathologist Joel Kleinman at the National Institute of Mental Health 
(NIMH) in Bethesda, Maryland. In work reported in this week’s Nature’, 
he and his colleagues applied genomic technologies to 269 brains span- 
ning the human lifetime and revealed an extraordinary wave of changes 
in gene expression that occur as the human brain develops. “It’s like I 
witnessed the poetry of birth,’ he says. 

But experiences such as Kleinman’ are rare, owing to the challenges 
of collecting and storing children’s brains. Parents must give permission 
shortly after their child has died, a time of inconsolable grief, and fetal 


D avid Amaral wanted to watch the young brain take shape. 
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brains are available only after an abortion — an incendiary political 
issue as well as an emotionally painful one for the women involved. 
Biomedical organizations have been tiptoeing around the delicacies for 
a decade or more. 

The solution, according to Amaral, is not complicated. Outreach 
programmes could be aimed at the coroners who conduct autopsies 
as well as at the families of children with brain disorders. They could 
explain the research value of donated brains and encourage families 
to sign up to a donor registry. A network of brain-collection centres 
around the United States could ensure that brains are preserved quickly. 
And centralized governance of the banks could direct tissue from each 
donated brain towards as much high-quality research as possible. “All 
it needs is for someone to take ownership of the issue,” Amaral says. 

That ownership may now be emerging from advocacy groups for 
neurodevelopmental disorders. “I know there has been a lot of talk 
and no action till now,’ says neuroscientist Robert Ring, vice-president 
of translational research at Autism Speaks, a research and advocacy 
organization based in New York. So Ring is pushing forward plans for 
a bank along the lines Amaral suggests. “Give us one year and we'll have 
developed a collaborative model with the scientific community, he says. 

Only two major brain banks store brains from children or fetuses 
and distribute them to the research community at large. One is run 
by the National Institute of Child Health and Human Development 

(NICHD) and held at the University of Maryland 


> NATURE.COM School of Medicine in Baltimore; the other, called 
Readmoreaboutthe the Autism Tissue Program, is run by Autism 
adolescent brainat = Speaks and is hosted at the Harvard Brain Tissue 
go.nature.com/pzw4xd = Resource Center in Belmont, Massachusetts. 
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At most brain banks, including the NICHD’s, personnel typically call 
the local coroner's office each morning. Ifa child is to be autopsied, they 
ask the office's permission to contact the family and request the brain for 
research. But the few coroner’s offices involved can collect only a small 
amount of tissue. The Autism Tissue Program depends more on families 
that get in touch when they experience such a bereavement. Experts 
then go out to retrieve and prepare the brain. As the programme collects 
brains from across the United States, this often means a long journey. 
Ideally, though, the brain should be acquired quickly after death to mini- 
mize the breakdown of proteins and other molecules that researchers 
might wish to study. Other factors also influence tissue integrity, such 
as how soon after death a body is refrigerated and whether the person 
died slowly and painfully, as scientists have shown that this alters gene 
expression in the brain, making it less useful for research. 

Collecting fetal brains is also hard. Brains from spontaneous 
abortions can't be used for research because the fetus has generally been 
dead for many hours before it is expelled. In fact, brains can be collected 
from abortions only when labour has been induced medically, because 
surgical procedures tend to damage the tissue. 

Neither the NICHD bank nor the Autism Tissue Program bank — 
which together hold nearly 1,300 brains from people aged 19 and under 
— can meet the demand from researchers. Neuroscientist H. Ronald 
Zielke, director of the NICHD bank, says that he turns down 20% of 
requests for tissue because of a lack of material. In particular, this and 
other brain banks are running critically short — or have run out — of 
the brain areas that are the most interesting for research into devel- 
opmental disorders, says Zielke. That includes the amygdala, which 
processes emotion, and the prefrontal cortex, 


which processes other cognitive andsocial “J have been 
behaviours. A brain bank, like any tissue plown away by 
repository, is also very expensive to run — how p arent. = 
th al direct costs for the NICHD bank ‘ 
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come to US$900,000. 

To get around the shortage, some research- 
ers have built up collections for their own use. 
Kleinman’s research on gene expression drew 
ona collection that he heads at the NIMH. A similar study in this week's 
Nature’, led by Nenad Sestan from the Yale University School of Medi- 
cine in New Haven, Connecticut, and with Kleinman as a co-author, 
drew in part ona collection that Sestan has generated at Yale. Their study 
showed the dramatic changes in gene expression that occur before and 
shortly after birth (see “Brain waves’). Neonatologist David Rowitch at the 
University of California, San Francisco, began a collection of brains at his 
hospital, which led to a paper published in last week’s Nature’ showing 
that the migration of ‘progenitor’ cells between two brain structures seen 
in infants slows down after the age of 18 months and has almost disap- 
peared by adulthood. He began collecting brains in 2008 with the support 
of the Howard Hughes Medical Institute in Chevy Chase, Maryland, 
and now has more than 100, most of which are from very young babies. 

These studies show how valuable such collections can be, but both 
Rowitch and Sestan describe the process of creating and running their 
own banks as “a big headache” because of the bureaucracy associated 
with handling human material. Sestan says that he would feel “much 
more comfortable” if the National Institutes of Health (NIH) were to 
run his collection. “It’s a huge effort for a small group and the NIH could 
do something on a larger scale,” he says. 


PUTTING BRAINS TOGETHER 


In fact, neuroscientists have been proposing for years that the NIH take 
a leading role in establishing a network of collection centres and stand- 
ardizing methods for brain collection and preservation. 

In July last year, Autism Speaks and the other major US foundation 
that funds autism work, the Simons Foundation in New York, made a 
formal proposal to the NIMH for a public-private partnership to col- 
lect brains from children with and without autism. The idea is that the 
advocacy groups would engage in intensive outreach efforts to potential 
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BRAIN WAVES 


The activity of genes linked to neural development changes most dramatically 
during prenatal development and infancy. 
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donors, particularly families who have a child with autism, and the NIH 
would fund and manage the bank. 

The NIH, though, has been slow to commit. Ring, who moved from 
the drug giant Pfizer to Autism Speaks in June this year and has the can- 
do air of someone used to industry deadlines, sees “a unique opportu- 
nity for the foundations to take on a leadership role”. His organization 
and the Simons Foundation are now in discussions with scientists to get 
agreement on scientific standards for the bank. He says that multiple 
collection centres will help to overcome geographical logistics, shorten- 
ing the time from death to collection, for example. 

Thomas Insel, director of the NIMH, says that the NIH already 
supports 11 brain banks related to different neurological disorders, 
and would like to adopt “a rational overall strategy rather than simply 
adding another boutique brain bank to the list”. He says that the NIH 
has now agreed in principle, at least, to create a ‘neurobiobank that 
would include both adult and children’s brains. Although no firm plans 
have been released, the bank would probably have multiple collection 
points (the agency’s existing tissue banks would become ‘nodes’), but 
centralized oversight and tissue distribution. That is essentially what 
the advocacy groups want. 

However the banks are organized, the agonizing task of approaching 
bereaved families will remain. Yet autism researcher Cynthia Schu- 
mann, who earlier this year became director of an effort by the MIND 
Institute to start a bank of its own, says that her first encounters with 
families who choose to donate were eye-opening. “I have been blown 
away by how parents have thanked us — for helping them to handle grief 
with the opportunity to give something back to help autism research,” 
she says. Schumann, like counsellors at Autism Speaks, has also spent 
time educating affected families about autism research. “Parents often 
agree to sign up to a registry, and to encourage other families to sign 
up too,” she says. So the reluctance to ask parents about acquiring their 
children’s brains, she thinks, may be ill-founded. 

That seems to be reflected in the experience of Valerie Hund, who 
donated the brain of her 16-year-old son, Grayson, to the MIND Insti- 
tute after he died in January. Grayson had autism and epilepsy, and had 
died during a seizure. Hund says that a neighbour was a board member 
of the MIND Institute, and that her elder daughter had thought to call 
him shortly after Grayson died. The donation, says Hund, “helped me 
to cope through the process. I’m happy that Grayson is a pioneer in this.” 

Hund says she thinks that the institute's programme for raising aware- 
ness on brain and tissue banking is important. “It would have been easier 
for us if we had thought about donation in advance — but that is the last 
thing on your mind.” mSEE EDITORIAL P.427 


Alison Abbott is Nature’s senior European correspondent. 
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Black Deat 


ECODED 


THE GENOME OF A 660-year-old bacterium is 


revealing secrets from one of EUROPE’S 
DARKEST CHAPTERS. 


By Ewen Callaway 


s word of a brutal pestilence raging across Europe reached London, 
its residents started digging. In 1348, Ralph Stratford, Bishop of 
London, dedicated acres of land that had been purchased to bury the 
legions of Black Death victims who would overwhelm existing church- 
yard cemeteries. Within two years, one-third to one-half of the city’s 

= - 40,000-100,000 residents succumbed, and many thousands were bur- 
ied in two newly dug cemeteries at East and West Smithfield. At the height of the scourge, 
200 bodies were interred each day. 

East Smithfield, originally called the Churchyard of the Holy Trinity, is one of a hand- 
ful of burial sites known to have been used only during the Black Death. In the 1980s, 
excavation of this ‘plague pit’ turned up nearly a third of the 2,400 bodies estimated to 
be buried there, some piled five deep. Despite the urgency of the time, the bodies were 
placed purposefully, oriented east to west, some with charcoal, possibly to absorb the 
fluids released during putrefaction, and many with coins and trinkets of their former 
lives. Such foresight not only helped keep corpses from piling up in the streets, but also, 
it seems, afforded some Black Death victims a dignified Christian burial. Six-and-a-half 
centuries later, it would also give scientists the opportunity to dissect the disease that laid 
waste to Europe (see ‘Death on the march). 

This month, geneticists reported that they have reconstructed the genome of Yersinia pes- 
tis, the bacterium that causes bubonic plague , recovered from remains at East Smithfield’. 
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DEATH ON THE MARCH 


In the 1340s, a pestilence originating in Western Asia spread 
rapidly across Europe. Before it overtook London in 1348, land 
was set aside in East Smithfield to bury the dead. 
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The sequence — the first from an ancient 
bacterial pathogen — may help to explain how 
a disease could wreak so much havoc. It also 
marks a renaissance in genetic studies of ancient 
diseases, a field that has suffered a controversial 
history but that is now being revitalized. “There 
will bea race now for all the ancient pathogens,” 
says Hendrik Poinar, a palaeogeneticist at 
McMaster University in Hamilton, Canada, 
who co-led the sequencing efforts. 


PLAGUED WITH DISBELIEF 

When Alexandre Yersin linked Y. pestis to 
bubonic plague in 1894, many scientists 
surmised that the pathogen was behind not 
only the Black Death, but also a spate of ear- 
lier mass die-offs. The sixth-century Justinian 
plague devastated Constantinople and killed 
millions in Europe and the Near East. Plagues 
reared their heads periodically for the next two 
centuries. Black Death itself reappeared several 
times, even into the nineteenth century. 

Clues tying Y. pestis to these outbreaks came 
largely from historical accounts of their symp- 
toms, such as Giovanni Boccaccio’s description 
of the Black Death in The Decameron, written 
around 1350: “It first betrayed itself by the 
emergence of certain tumours in the groin or 
the armpits, some of which grew as large as a 
common apple, others as an egg.” 

But some modern historians and scientists 
came to doubt that Y. pestis caused these ancient 
outbreaks. Bubonic plague epidemics known to 
have been caused by Y. pestis in the past cen- 
tury seemed too mild to have been caused by 
the same culprit as the Black Death: they killed 
fewer people and spread more slowly. Some 
‘plague revisionists’ have argued that fleas, 
which spread Y. pestis to humans, would have 
struggled to survive the cold temperatures 
reported during the Black Death. And there 
was the speed with which it killed — Boccaccio 
reported that death often occurred within three 
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In the late 1980s, 
large areas of the 
East Smithfield 
cemetery were 
excavated, revealing 
the remains of 762 
individuals. 


days of the first symptoms appearing. Anthrax 
or a haemorrhagic-fever-causing virus similar 
to Ebola would be more likely than plague to 
cause such a rapid demise, say critics. 

DNA evidence would seem to offer a 
definitive answer. In 2000, a team led by Didier 
Raoult, a microbiologist at the University of 
the Mediterranean in Marseilles, France, said 
it had proved the link between the bacterium 
and the disease. The researchers reported’ that 
they had successfully recovered Y. pestis DNA 
from the teeth ofa child and two adults dug up 
from a fourteenth-century mass burial site in 
Montpellier. The team identified the bacterium 
using a sensitive technique called the polymer- 
ase chain reaction (PCR) to amplify a portion of 
agene from Y. pestis called pla. “We believe that 
we can end the controversy,’ the team wrote’. 
“Medieval Black Death was plague.” 

But several critics raised concerns about 
contamination. The PCR might instead have 
amplified DNA from modern Y. pestis used 
previously in the lab, or possibly the sequences 
from a closely related soil-dwelling bacte- 
rium. “I could never, ever replicate it,” says 
Thomas Gilbert, an evolutionary geneticist at 
the University of Copenhagen in Denmark. 
In 2004, Gilbert and his colleagues reported 
no trace of Y. pestis DNA in 108 teeth from 61 
individuals found in plague pits in France, Den- 
markand England (including East Smithfield)’. 

Raoult says that there was no contamination 
and that Gilbert's methods did not accurately 
replicate his*. Still, those who were already 
sceptical of the suggestion that Y. pestis caused 
the Black Death latched on to Gilbert’s study. 

Other studies of microbial DNA extracted 
from ancient human remains — including 
those affected by tuberculosis, syphilis and 
malaria — were also being scrutinized. In 
several cases, researchers could not replicate 
results, or they found methodological short- 
comings. Critics said that DNA from these 


samples was too degraded by heat, moisture 
and time to detect, and the field soon divided 
into believers and sceptics. 

“There was a complete schism,” says Ian 
Barnes, a palaeogeneticist at Royal Holloway 
University of London, who says he spent two- 
and-a-half years trying — unsuccessfully — to 
find DNA evidence of syphilis or tuberculosis 
in bones dating from the nineteenth and early 
twentieth centuries’. “People largely ignored 
each other,’ he says. 


DIGGING UP ANSWERS 

Although Poinar was dubious of claims about 
ancient microbial DNA, he was intrigued by 
the bones from East Smithfield. Nearly all of 
the remains are from Black Death victims, 
many of whom were cut down during the 
prime of their lives. 

In a bright ground-floor laboratory of the 
Museum of London, a short walk from East 
Smithfield, osteoarchaeologist Jelena Bekvalac 
examines the nearly complete skeleton of one 
of the plague pit’s former residents. Wearing a 
black silk scarf dotted with white skull-prints, 
Bekvalac handles a pelvic bone and determines 
that it belonged to a man who died in his late 
teens or early twenties. Apart from some plaque 
on his teeth and a gash in his skull that shows 
some signs of healing, the man's skeleton offers 
no outward evidence of Black Death. 

His remains, and those from hundreds of 
others, represent a snapshot of life and death 
in London during the epidemic. Since the site’s 
excavation, researchers have descended on the 
bones in search of information. 

In the late 1990s, Poinar met Sharon 
DeWitte, then a graduate student at 
Pennsylvania State University in State College, 
who was working on a demographic analysis 
of the remains suggesting that Black Death 
preferentially killed those who were already 
frail. The two considered drilling into teeth 
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and bones to find Y. pestis DNA, but Poinar 
wasn't satisfied with the available detection 
tools, which were still based on PCR. “We sort 
of sat on the samples for a few years waiting for 
all the stars to align,” says De Witte, now at the 
State University of New York at Albany. 

That alignment came from next-generation 
DNA sequencers, machines that read short 
snippets of DNA. The technology was perfect 
for sequencing DNA that has been damaged by 
spending hundreds of years underground. 

The sequencers allowed Svante Paabo, a 
palaeogeneticist at the Max Planck Institute 
for Evolutionary Anthropology in Leipzig, 
Germany, and his team to sequence a draft 
of the Neanderthal genome’. But finding and 
sequencing ancient pathogens in a human skel- 
eton is much harder — like finding “needles in 
the football field’, Poinar says — because their 
genomes are 1,000 times shorter than that of 
the Neanderthal and closely resemble those of 
soil microbes that have infiltrated the bones. 

Another technology helped narrow the 
search. Paabo and his team developed a tech- 
nique, called targeted capture, in which they 
used lab-synthesized ‘bait’ DNA to snag ancient 
DNA strands from a bone sample’, leaving soil- 
microbe and other sequences behind. “It’s pretty 
much like fishing in a pond,” says Johannes 
Krause, a palaeogeneticist at the University of 
Tubingen in Germany, who worked with Paabo 
on the Neanderthal genome and co-led the 
Black Death project with Poinar. 

Ina proof-of-principle experiment published 
in August of this year, Krause and Poinar’s team 
used sequences from a contemporary plague 
strain to fish out Y. pestis DNA from the teeth 
of victims buried at East Smithfield. From this, 
they sequenced a short loop of DNA, called the 
pPCP1 plasmid, that is partially responsible for 
bubonic plague’s ability to infect humans. 

Their results®, along with a paper published 
last year” that found Y. pestis sequences in 
different Black Death bone samples, have 
convinced most scientists that bubonic plague 
was involved in the Black Death. 

In their most recent paper’, Poinar and 
Krause completed the ancient genome and 
showed that it sits at the root of an evolutionary 
tree that comprises 17 contemporary strains 
of Y. pestis. This indicates that the Black Death 
strain spawned many of the forms of Y. pestis 
that infect humans today. 

This strain, Krause adds, probably emerged 
not long before the Black Death started its 
rampage across western Asia and Europe in 
the fourteenth century. “That, for me, was the 
biggest surprise,’ he says. It suggests, the authors 
argue, that earlier plagues were caused by either 
anow-extinct strain of Y. pestis or by an entirely 
different pathogen. 


Mark Achtman, a DNATURE.COM 
plague-evolution expert For more on Black 
at University College Death research, see 
Cork in Ireland, calls this Nature's video: 
interpretation “absolute —_ go.nature.com/hxbtel 


nonsense”. Krause and Poinar’s team did not 
consider a number of modern plague strains 
found in central and east Asia, which are 
thought to have earlier origins than the East 
Smithfield strain, Achtman says. Genome 
sequences for these strains were not available 
to his team, says Krause, but he is eager to see 
how they are related. 


MYSTERIOUS SCOURGE 

Just as puzzling, however, is that Y. pestis seems 
to have changed very little over the past 660 
years. The genome of the Black Death strain 
differs from that of the modern Y. pestis 
‘reference’ strain by about 100 nucleotides, 
but each of these genetic differences can be 
found in at least one contemporary strain. “We 


Historical descriptions of the Black Death have 
helped link Yersinia pestis with the disease. 


cant find anything that makes the Black Death 
special,” Krause says. 

The team is now looking for other genetic 
changes that could account for the Black 
Death's ferocity, such as rearrangements in 
the genome, which are difficult to determine 
from the short fragments of DNA available. 
To better understand how the plague worked, 
researchers could try to resurrect the Black 
Death pathogen by modifying the genomes of 
contemporary Y. pestis strains. Although this 
might sound alarming, research on Y. pestis 
is already carefully controlled, and even an 
accidental infection with such a strain could be 
easily treated with modern antibiotics. 

Moreover, Poinar says, the Black Death was 
not just about the bacterium. Environmental 
and epidemiological factors must have aided 
in its vicious tear through Europe. Sick soldiers 
returning to Europe from Caffa, the Black Sea 
port that was the plague’s gateway from Asia, 
unleashed the disease on a population that 
would have been weakened by malnourishment 
and years of cold, wet weather, he says. 

Achtman says that it is possible that Black 
Death was not spread by rat-dwelling fleas, as 
Y. pestis is today, but by other animals, which 
could have enhanced transmission. Or another 
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circulating pathogen could have contributed, 
as in the ‘Spanish flu’ pandemic that killed up 
to 100 million people worldwide in 1918-19, 
often with the help of bacterial pneumonia. 

Whatever questions remain about the 
Black Death, scientists are now keen to 
apply the latest sequencing methods to other 
ancient epidemics. “I’ve completely gone 
from thinking, ‘ancient pathogens are a load 
of crap, to ‘hold on, maybe some of this stuff 
works,” says Gilbert, whose team has started 
to sequence DNA from pathogens that plagued 
ancient crops. Researchers could identify 
ancient microbes and chart their spread and 
their evolutionary relationships with contem- 
porary strains. For example, Europeans who 
travelled to the New World may have intro- 
duced new forms of tuberculosis to North 
America and brought syphilis back to Europe. 

Ancient pathogens may help scientists 
understand current and future outbreaks, says 
Terry Brown, a biomolecular archaeologist at the 
University of Manchester, UK. He and Charlotte 
Roberts, of Durham University, UK, are 
charting the evolution of tuberculosis strains 
in Britain and Europe. “By looking over the 
past 1,000 years of disease in British cities, we 
can understand problems occurring in the 
Third World, where more and more people 
are crowding into cities,” he says. Similarly, the 
sequencing and resurrection of the influenza 
strain responsible for the 1918 pandemic” has 
helped researchers to interpret the sequences of 
contemporary flu strains. 

For all its ferocity, the Black Death left few 
visible marks on London. Today, the plague pit 
at East Smithfield is in the heart of London’s 
financial district, buried under modern office 
suites and the old Royal Mint building. The only 
visible remnants are the crumbled ruins of St 
Mary Graces, a Cistercian abbey built near the 
site in 1350. 

London may have seen its last significant 
bubonic plague outbreak, but catastrophic 
epidemics are a rule of human history, not an 
exception. Centuries from now, what traces 
will the next great scourge leave? Future 
archaeologists chronicling its history may 
find memorials, graves and probably even 
the bodies of victims. But another story will 
also lurk in its DNA, just waiting to be read. m 


Ewen Callaway writes for Nature from London. 
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Families in Bangladesh seek safer areas after severe floods in 2007. 


Migration as 
adaptation 


Mobility can bring opportunities for coping with 
environmental change, say Richard Black, Stephen R. 
G. Bennett, Sandy M. Thomas and John R. Beddington. 


r | The effects of global environmental 
change, including coastal flooding, 
reduced rainfall in drylands and 

water scarcity, will almost certainly alter 
patterns of human migration. Conventional 
narratives usually cast these displacements 
in a negative light, with many millions of 
people forced to move, and tension and 
conflict the result. Our study suggests that 
the picture is not so one-sided. 

The study, the UK government's Foresight 
report on migration and global envi- 
ronmental change, examines the likely 
movement of people within and between 
countries over the next 50 years’. It contends 
that, although environmental change will 
alter an already complex pattern of human 
mobility, migration will offer opportunities 
as well as challenges. The greatest risks will 
be borne by those who are unable or unwill- 
ing to relocate, and may be exacerbated by 
maladaptive policies designed to prevent 
migration. It is time for a fresh discourse — 
and fresh research — on migration in rela- 
tion to global environmental change. 

International action and research are 
needed to identify the positive and nega- 
tive outcomes of migration influenced by 
environmental change. Whether movement 
occurs within or between countries, there is a 
need to prepare for it and in some cases enable 
it. It is important to deepen understanding of 
how migration will affect other types of social 
change, such as the evolution of cities, the 
formation of ‘poverty traps and the coexist- 
ence of cultures. Current policy frameworks 
should take account of these factors to avoid 
having to deal later with impoverishment and 
displacement under high-risk conditions. 


THE REALITY OF MIGRATION 
Many people across the world are already 
migrating, motivated by strong socio- 
economic factors. The United Nations 
estimates that there are about 210 mil- 
lion international migrants, but as many 
as 740 million internal (intranational) 
migrants’. People migrate for complex 
reasons: to improve incomes; to join fam- 
ily members; to escape 
persecution; and to 


Migration: an remove themselves 
engine for social from environmental 
change: or other threats, often 


temporarily. Such 
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SOURCE: A. VAFEIDIS ET AL. ANALYSIS OF LAND AREA AND POPULATION IN THE LOW 


ELEVATION COASTAL ZONE (LECZ) (GOVERNMENT OFFICE FOR SCIENCE, 2011) 


THE DRIVERS OF MIGRATION 


Many factors influence whether a person or family will migrate. Their effects are 
closely intertwined, so it makes little sense to consider any of them in isolation. 


SOCIAL DRIVERS 
Education, family/kin 


ENVIRONMENTAL DRIVERS 
Exposure to hazard, 
ecosystem services such as 
land productivity, habitability, 
food/energy/water security 


THE INFLUENCE OF 
ENVIRONMENTAL 
CHANGE ON DRIVERS 


ECONOMIC DRIVERS 
Employment opportunities, 
income/wages/well-being, 
producer prices (such as in 
agriculture), consumer prices 


> drivers will change in their proportions 
over coming decades, but how they influence 
people's decisions about where they live will 
not (see “The drivers of migratior). 
Environmental factors will increasingly 
influence migration. Current greenhouse-gas 
emissions are already committing the planet 
to likely climate changes in the next 20 years. 
In Bangladesh, for example, moving to cities 
has become a common coping strategy in 
the face of flooding. In a 2008 study, 22% of 
households affected by tidal-surge floods, and 
16% affected by riverbank erosion, moved 
to urban areas’. Diminished soil quality in 
Kenya has led people to travel to diversify 
their income. In 2004-05, for example, tem- 
porary labour migration in households that 
were farming land with high-quality soils was 
67% lower than in those using poor soils’. 
Not everyone is able to migrate. There may 
be confounding socio-political factors, such 


URBAN COASTAL FLOOD RISK 


PERSONAL/HOUSEHOLD CHARACTERISTICS 


POLITICAL DRIVERS 
Descrimination/persecution, 
governance/freedom, 
conflict/insecurity, policy 
incentives, direct coercion 


DEMOGRAPHIC DRIVERS 
Population size/density, 
population structure, 
disease prevalence 


INTERVENING OBSTACLES AND FACILITATORS 


Age, sex, education, wealth, marital status, 
preferences, ethnicity, religion, language 


see 
— 


Political/legal framework, cost of moving, 
social networks, diasporic links, recruitment 


agencies, techno 


as in Somalia where armed conflict restricts 
movement’, or in New Orleans, Louisiana, 
where the evacuation plan for Hurricane Kat- 
rina assumed that everyone had access to a 
car’. Migration is often expensive, and those 
most vulnerable to environmental change are 
usually poor. For example, in Uganda, a rela- 
tively settled country with high ‘entry costs’ 
for housing, schools and marriage, those 
who are wealthier are more able to relocate. 
In Mali, emigration decreased during the 
severe droughts of 1983-85 alongside a rise 
in rural poverty’. 


INTERLINKED FACTORS 

Environmental change can increase the 

incentive to move, but it can also limit the 

capacity to do so. It should be seen as affect- 

ing the many linked drivers of migration. 
People are as likely to migrate into places 

of environmental vulnerability as away from 


The number of people living in cities that are at risk of coastal flooding is set to increase dramatically over 
the coming decades in both ‘high’ and ‘low’ scenarios of economic growth and governance*. 


70: i Eastern Asia {> Western Asia 
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2030 ‘low’ 
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Year and scenario 
*Low scenario = high economic growth and inclusive governance; high scenario = low global economic growth and fragmented governance 
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them — a point that has been insufficiently 
acknowledged. In rapidly growing megacities, 
such as Dhaka and Lagos, that are located in 
delta and coastal floodplain regions in Africa 
or Asia, hundreds of millions more people 
may be at risk of flooding by 2060 (see “Urban 
coastal flood risk’). Migrants stretch the 
capacity of existing infrastructure, especially 
in low-income countries, and new arrivals 
are frequently vulnerable. In Dakar, Senegal, 
for example, 40% of those who moved there 
between 1998 and 2008 live in areas of high 
flood risk’. 

Political instability, poor governance, 
conflict and social pressures compound 
these problems. For example, Zimbabwe's 
political and economic crisis, amplified in 
rural areas by drought, has contributed to 
the migration of between 1.5 million and 
2 million Zimbabweans to South Africa 
since 2000. In 2008, attacks against these 
migrants resulted in 65 deaths and the 
further displacement of 150,000 people’. 

In many cases of mass migration, especially 
when coupled with environmental hazard, 
humanitarian assistance might be needed. 
And such upheavals may have political 
ramifications. Ifsea level rise were to engulfa 
small island state, for example, it would raise 
issues of sovereignty, and questions of who is 
responsible for displaced populations. 

Migration may be the most effective 
way to allow people to diversify income 
and build resilience where environmental 
change threatens livelihoods. It is therefore 
necessary to make channels for voluntary 
migration available. 

Within countries, this implies removing 
arbitrary restrictions on movement, and 


SOURCE: REF 1 
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Poverty trap: residents of this slum in Luanda would find it hard to avoid environmental changes. 


providing basic infrastructure to enable 
relocation and settlement in urban areas, ide- 
ally sustainably. Internationally, this might 
include the extension of regional economic 
communities to cover the free movement 
of people as well as money and goods. 
Those at risk of being trapped — the poor- 
est and least mobile — require additional 
measures, such as functional early-warning 
systems and tested emergency evacuation 
plans, to minimize their vulnerability to 
extreme events (see “Trapped populations’). 


AROAD MAP FOR ACTION 

For international policy-makers, climate 
mitigation and the reduction of negative 
environmental change should continue to 
be priorities. But mechanisms for funding 
adaptation to climate change also need to 
account for migration as a way of build- 
ing resilience. It is therefore important that 
long-term initiatives, including for example 
those instigated under the United Nations 
Framework Convention on Climate Change, 
recognize the links between global environ- 
mental change and migration. 

These initiatives should consider the 
realities of migration, including benefits as 
well as challenges. And they should focus on 
the resilience of populations that are mov- 
ing to, or are trapped in, urban areas that 
are vulnerable to environmental change, 
particularly in low-income countries. 

Other actions to boost resilience — 
including sustainable urbanization, climate- 
smart development, conflict resolution and 
emergency preparedness — need to take 
account of an increased propensity for peo- 
ple to migrate. Planners will need to provide 


flood-control, water-management, forecast- 
ing and warning capacities to growing urban 
populations. Furthermore, migrants may be 
socially excluded and so will need special 
attention. 

To increase the potential benefits of 
international migration, policies can link its 
adaptive advantages for some migrant com- 
munities to demographic deficits and labour 
shortages in potential host nations. Circular 
migration schemes are one option, to allow 
those in environmentally vulnerable areas to 
work seasonally or on a temporary basis in 
countries where their skills are in demand. 
A strategic international approach to migra- 
tion also needs to pay attention to regional 
and global demand for skilled workers in 
particular sectors. 

Whole populations need not abandon 


TRAPPED POPULATIONS 


Impoverished people face a double set of 
risks. They are unable to move away from 
environmental threats, and their lack of 
capital makes them especially vulnerable to 
environmental changes. 
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their former homes. Migration of some 
individuals can help a community to remain 
viable in the long run, for example if money 
and goods are sent back to help build their 
resilience. In Africa, where the majority of 
international migrants stay within their 
subregion, such remittances to home com- 
munities quadrupled to nearly US$40 billion 
between 1990 and 2010, surpassing official 
development assistance since 2007 (ref. 10). 

Researchers in the fields of development, 
climate and environmental science, and 
climate adaptation need to pay more atten- 
tion to migration. A better understanding 
is required of the extent to which migration 
influences vulnerability and resilience in the 
face of environmental change. So, too, is clar- 
ity on the adequacy of policy responses to 
address the impact of global environmental 
change on migrant and non-migrant com- 
munities. Such knowledge must be based 
on empirical research, and underpinned by 
longitudinal data on migration flows. 

It is vital for the research community to 
provide insights into what outcomes can 
be expected, and the Foresight project pro- 
vides a framework with which to start this 
endeavour. = 
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Hundreds of thousands fled the 1930s US Dust Bowl; more drought-spurred migrations are expected. 


~~ 


The next 
dust bowl 


Drought is the most pressing problem caused by 
climate change. It receives too little attention, says 
Joseph Romm. 


ich impact of anthropogenic 
global warming will harm the 
most people in the coming dec- 


ades? I believe that the answer is extended or 
permanent drought over large parts of cur- 
rently habitable or arable land — a drastic 
change in climate that will threaten food secu- 
rity and may be irreversible over centuries. 

A basic prediction of climate science is 
that many parts of the world will experience 
longer and deeper droughts, thanks to the 
synergistic effects of drying, warming and 
the melting of snow and ice. 

Precipitation patterns are expected to 
shift, expanding the dry subtropics. What 
precipitation there is will probably come in 
extreme deluges, resulting in runoff rather 
than drought alleviation. Warming causes 
greater evaporation and, once the ground 
is dry, the Sun’s energy goes into baking the 
soil, leading to a further increase in air temp- 
erature. That is why, for instance, so many 
temperature records were set for the United 
States in the 1930s Dust Bow]; and why, in 


2011, drought-stricken Texas saw the hottest 
summer ever recorded fora US state. Finally, 
many regions are expected to see earlier 
snowmelt, so less water will be stored on 
mountain tops for the summer dry season. 
Added to natural climatic variation, such as 
the El Nifio-La Nifia cycle, these factors will 
intensify seasonal or decade-long droughts. 
Although the models don't all agree on the 
specifics, the overall drying trends are clear. 

I used to call the confluence of these pro- 
cesses ‘desertification’ on my blog, Climate- 
Progress.org, until some readers pointed out 
that many deserts are high in biodiversity, 
which isn't where we're heading. “Dust- 
bowlificatior is perhaps a more accurate 
and vivid term, particularly for Americans 
— many of whom still believe that climate 
change will only affect far-away places in 
far-distant times. 


Prolonged drought NATURE.COM 
will strike around the Mega-drought threat 
globe, butitis surpris-  toUS Southwest: 
ing to many that it — go.iafure.com/my4gey 
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would hit the US heartland so strongly and 
so soon. 

The coming droughts ought to be a major 
driver — if not the major driver — of climate 
policies. Yet few policy-makers and journal- 
ists seem to be aware of dust-bowlification 
and its potentially devastating impact on 
food security. That’s partly understandable, 
because much of the key research cited in 
this article post-dates the 2007 Fourth Assess- 
ment Report of the Intergovernmental Panel 
on Climate Change (IPCC). Raising public 
awareness of, and scientific focus on, the 
likelihood of severe effects of drought is the 
first step in prompting action. 


AMERICAN NIGHTMARE 

I first heard of the risks in a 2005 talk by 
climatologist Jonathan Overpeck of the Uni- 
versity of Arizona in Tucson. He pointed to 
emerging evidence that temperature and 
annual precipitation were heading in oppo- 
site directions over many regions and raised 
the question of whether we are at the “dawn 
of the super-interglacial drought”. 

The idea wasn't new. As far back as 1990, 
scientists at NASA‘s Goddard Institute for 
Space Studies in New York projected that 
severe to extreme drought in the United 
States, then occurring every 20 years or so, 
could become an every-other-year phenom- 
enon by mid-century’. 

Events are starting to bear out these 
worrying predictions. Snowpack reduction, 
early snowmelt and a decrease in dry-season 
river flow in the American West, forecast 
more than two decades ago, have now been 
measured’. In much of the northern Rock- 
ies, the peak of the annual stream runoff is 
up to three or four weeks earlier than it was 
half a century ago’. Heat and drought — 
coupled with the greater impact of destruc- 
tive species, such as bark beetles, aided by 
warming — have increased forest die-off 
and the risk of wildfire. 

The palaeoclimate record dating back to 
the medieval period reveals droughts lasting 
many decades. But the extreme droughts that 
the United States faces this century will be far 
hotter than the worst of those: recent decades 
have been warmer than the driest decade of 
the worst drought in the past 1,200 years’. 

And much warmer conditions are pro- 
jected. According to a 2009 report of the US 
Global Change Research Program’, warming 
over mid-latitude land masses, such as the 
continental United States, is predicted to be 
higher than the forecast average global warm- 
ing: much of the inland United States faces a 
rise of between 5°C and 6°C on the current 
emissions path (that is, ‘business as usual’) by 
the century’s end, with a substantial fraction 
of that warming occurring by mid-century. 

A 2007 analysis of 19 climate projections 
estimated that levels of aridity comparable 
to those in the Dust Bowl could stretch from 
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Then and now: sun-baked dry soils kick up clouds of dust in the 1930s (left) and in the modern United States (right). 


Kansas to California by mid-century. To 
make matters worse, the regions at risk of 
reduced water supply, such as Nevada, have 
seen a massive population boom in the past 
decade. Overuse of water in these areas has 
long been rife, depleting groundwater stores. 

Of course, the United States is not alone 
in facing such problems. Since 1950, the 
global percentage of dry areas has increased 
by about 1.74% of global land area per dec- 
ade’. Recent studies have projected ‘extreme 
drought’ conditions by mid-century over 
some of the most populated areas on 
Earth — southern Europe, south-east Asia, 
Brazil, the US Southwest, and large parts 
of Australia and Africa®. These dust-bowl 
conditions are projected to worsen for many 
decades and be “largely irreversible for 1,000 
years after emissions stopped”. 

The concept of drought has not been 
ignored by the IPCC and other scientific 
groups; there is even a United Nations Con- 
vention to Combat Desertification. But the 
cumulative risks don’t seem to have been 
fully recognized by the public and by policy- 
makers. And key questions remain to be 
answered, ideally in a dedicated report by an 
organization such as the US National Acad- 
emy of Sciences or the IPCC. 


UNANSWERED QUESTIONS 
Most pressingly, what will happen to global 
food security if dust-bowl conditions become 
the norm for both food-importing and food- 
exporting countries? Extreme, widespread 
droughts will be happening at the same time 
as sea level rise and salt-water intrusion 
threaten some of the richest agricultural 
deltas in the world, such as those of the Nile 
and the Ganges. Meanwhile, ocean acidifica- 
tion, warming and overfishing may severely 
deplete the food available from the sea. 
What are the implications of dust- 
bowlification for energy generation? After 


agriculture, energy generation is responsible 
for the majority of freshwater withdrawals, 
and two key strategies for generating addi- 
tional potable water — wastewater purifica- 
tion and desalinization — are both energy 
intensive. Future energy systems will need 
to be low on greenhouse-gas emissions and 
on water use. In particular, thermal power 
plants — including nuclear — may need 
to switch from evaporative or ‘wet cooling’ 
systems to dry cooling 
techniques, which, 
unfortunately, tend to 
be less efficient. 
From an ecological 
perspective, what will 
be the effects of dust- 
bowlification on the 
global carbon cycle? 
In the past six years, 
the Amazon has seen 
two droughts of the sort expected once in 
100 years, each of which may have released 
as much carbon dioxide from vegetation die- 
offas the United States emits from fossil-fuel 
combustion in a year. More frequent wildfires 
also threaten to increase carbon emissions. 
And as habitats are made untenable, what 
will be the effect on biodiversity? 

At the same time, drought models need 
to be improved. They successfully chart the 
hydrological changes seen in the US South- 
west and the drying seen at the global level’, 
but regional predictions can be disturbingly 
variable. Some models forecast an increase 
in precipitation for East Africa, whereas oth- 
ers correctly predicted in 2010 that warming 
of the Indian Ocean would lead to drought 
in the region, such as this year’s devastating 
drought in Somalia. The models need higher 
resolution and a better understanding of pre- 
cipitation, sea surface temperature and the 
effects of vegetation. 

Human adaptation to prolonged, 


extreme drought is difficult or impossible. 
Historically, the primary adaptation to 
dust-bowlification has been abandonment; 
the very word ‘desert’ comes from the Latin 
desertum for ‘an abandoned place. During 
the relatively short-lived US Dust-Bowl era, 
hundreds of thousands of families fled the 
region. We need to plan how the world will 
deal with drought-spurred migrations (see 
page 447) and steadily growing areas of non- 
arable land in the heart of densely populated 
countries and global bread-baskets. Feeding 
some 9 billion people by mid-century in the 
face of a rapidly worsening climate may well 
be the greatest challenge the human race has 
ever faced. 

These predictions are not worst-case 
scenarios: they assume business-as-usual 
greenhouse-gas emissions. We can hope 
that the models are too pessimistic, but some 
changes, such as the expansion of the subtrop- 
ics, already seem to be occurring faster than 
models have projected’®. We clearly need to 
pursue the most aggressive greenhouse-gas 
mitigation policies promptly, and put dust- 
bowlification atop the world agenda. = 


Joseph Romm is a physicist who edits the 
blog ClimateProgress.org for the Center 
for American Progress Action Fund, 
Washington DC, USA. 
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Despite multiple conflicts and two world wars in the past century, societal evolution has led to an overall decline in violence and death. 


PSYCHOLOGY 


A farewell to arms 


Martin Daly explores Steven Pinker’s treatise on the 
taming of human aggression. 


ars and genocides, murder and 
mayhem — violent victimization 
seems to be rising inexorably. 


But the massive coverage of these horrific 
phenomena masks an important truth. On 
average, our chances of being assaulted or 
killed have been falling for centuries. In 
The Better Angels of Our Nature, psycholo- 
gist Steven Pinker reviews the evidence for 
this stunning historical trend, and tries to 
explain it. 

In what is arguably his most ambitious 
work yet, Pinker includes figures showing 
declining rates of homicide, warfare, acts 
of terrorism, child abuse and other forms of 
violence over various timescales. But there is 
more here than statistics. Pinker’s narrative 
moves from prehistory, through the social 
and intellectual revolutions of the eighteenth 
and nineteenth centuries, to current findings 
on mind, brain and behaviour. Citing the 
insights and scholarship of not just the usual 
gang of psychologists, neuroscientists and 
evolutionary biologists, but also ofhistorians, 
philosophers and every sort of social scien- 
tist, he concludes that societal evolution has 
reduced the incentives to commit violence 
and changed modern sensibilities. 

The intellectual hero of Pinker’s story — a 


man he calls “the most 
important thinker you 
have never heard of” 
— is the German-born 
sociologist Norbert 
Elias. In the late 1930s, 
he proposed that 
recent centuries have 
witnessed a ‘civilizing 
process’ in the form of 


i ane Angels a growing regard for 
of Our Nature: >: 
Why Violence Has others’ interests and 


improved self-control. 
According to Elias, 
the civilizing process 
had two causes. One 
was the consolidation 
of governments (as 
described in Thomas 
Hobbes’s 1651 book 
Leviathan) that could monopolize legitimate 
violence and arbitrate disputes, reducing 
the need for private retribution. The other 
was the rise of ‘gentle commerce, whereby 
mutual gains from trade created a common 
purpose. 

After introducing and defending Elias’s 
ideas, Pinker applies them to history. 
Reviewing the ‘humanitarian revolution, he 


Declined/The 
Decline of Violence 
in History and Its 
Causes 

STEVEN PINKER 
Viking/Allen Lane: 
2011. 832 pp. 
$40/£30 


describes how, in Europe and elsewhere, the 
xenophobia that was once ubiquitous became 
untenable over several centuries. Torture, 
execution at whim and slavery also moved 
from the mainstream to the marginal. 

Pinker then makes the case that warfare 
has long been in decline — and may be fac- 
ing extinction. Scholarly analysis of armed 
conflicts and their death tolls apparently 
demonstrates that a bias towards recency 
has blinded us to this startling truth: that 
both the incidence of war and the death rate 
it imposes have been shrinking. Even in the 
twentieth century, with its two world wars, 
these numbers were lower than in previous 
centuries. And they have kept falling. Pinker 
chronicles the ‘rights revolutions’ of the 
twentieth century: struggles for civil rights, 
women’s rights, children’s rights, gay rights 
and animal rights, with thought-provoking 
discussion of the rapidly changing sensi- 
bilities that accompanied them. His skill in 
mixing quantitative analysis with illustrative 
examples, apt quotations and the occasional 
joke makes these chapters page-turners. 

Pinker then turns his attention to the links 
between the history of violence and his view 
— familiar from his previous books — that 
the evolved human psyche is a bundle of 
special-purpose ‘faculties’ including cer- 
tain “inner demons” and “better angels”. 
He reviews what neuroscience has to say 
about aggression and empathy, and what 
social psychologists have discovered about 
the elicitors of sympa- 


thy, punitiveness and NATURE.COM 
other mental states. For Steven Pinker’s 
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a surprising, force- _ violence: 
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historical rise in human reasoning ability, 
which he believes may provide a counter- 
weight to parochialism and intergroup 
hostility. 

Any thoughtful reader of this wide- 
ranging treatise will find nits to pick. I 
am not persuaded that because 1960s 
counterculture glorified selfish impul- 
siveness (‘just do it’), it was responsible 
for a small increase in the US homicide 
rate. The counterculture was largely 
about forsaking violence (‘make love, 
not war’). Neither am I persuaded by 
Pinker’s explanations for the decline in 
US homicide in the 1990s; in my view, 
he too hastily dismisses the possible 
relevance of demographic changes, and 
too credulously accepts that increased 
police presence and incarceration were 
important. 

Pinker’s biggest slip, in my view, con- 
cerns the relevance of income inequality, 
which has been the most successful pre- 
dictor of variability in homicide rates 
between different places. Pinker gives 
it one brief paragraph, waving it off on 
the grounds that the standard index of 
income inequality was going up dur- 
ing the 1990s in the United States while 
crime rates were falling, and was at a 
nadir in 1968 when crime was “soaring”. 
The trouble with this argument is that 
there is no reason to expect simultane- 
ous short-term vicissitudes of income 
inequality and homicide; any effect of 
the former on the latter is surely medi- 
ated by people’s cumulative experiences 
over their lifetimes. And it is ironic that 
despite Pinker’s dismissal, the big histori- 
cal story he tells — stressing the decline 
of despotism and marauding and the rise 
of democratic governments — is itself a 
tale of decreasing inequality. 

Pinker closes with a rousing defence 
of modernity. Ultimately, his explana- 
tion for the decline of violence is Elias’s 
— that the synergistic impacts of Levia- 
than and gains from trade have created a 
civilizing process that has diminished the 
utility of violence and, hence, its appeal. 
But he elaborates on this with an engag- 
ing game-theoretical twist (the “Pacifist’s 
Dilemma”), and more-up-to-date psy- 
chology than Elias would have been 
able to muster. The Better Angels of Our 
Nature is a lively, fascinating read and a 
remarkable scholarly achievement that 
deserves to be studied and debated by 
many social scientists, concerned citizens 
and policy-makers. m 


Martin Daly is professor of psychology 
in the Department of Psychology, 
Neuroscience and Behaviour, McMaster 
University, Ontario L8S 4K1, Canada. 
e-mail: daly@mcmaster.ca 
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A biology graduate’s back carries a reminder that DNA gave rise to all the biodiversity on Earth. 


SOCIOLOGY 


The illustrated scientist 


Margo DeMello is fascinated by the evocative tattoo 
culture among different ‘tribes’ of scientists. 


attoos were taboo until recently in the 
ik: — seen by most as the barbaric 

practice of marginalized or under- 
world groups. Now, tattooing is undergoing 
a renaissance. Almost mainstream in Europe 
and North America, tattoos are becoming 
ever more artistically sophisticated and 
personally meaningful. 

Carl Zimmer's beautiful new book, Science 
Ink, focuses on tattoo culture among scien- 
tists, both amateur and professional. Zimmer, 
himself a tattoo-free science writer, began 
asking researchers to send photographs of 
their science-related tattoos to The Loom, his 
blog for Discover Magazine, in August 2007. 
These, and the stories behind them, evolved 
into Science Ink. 

The book is broadly organized by disci- 
pline, featuring photos of tattoos themed to 
each — astronomy, chemistry, evolution, nat- 
ural history, neuroscience and palaeontology. 
The scientists are using their body art to mark 
their standing as members of these ‘tribes’: so 
you see stars on astronomers, bacteria on bio- 
chemists, insects on entomologists and equa- 
tions and symbols on mathematicians. And 


2011 
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there are molecules of 
every type, including 
pages of double helices. 

Some designs are 
iconic, such as E= mc’; 
or personal, like the 
chemist’s tattoo of the 
molecular structure of 
phenobarbital, a drug 
he gives to his cat to 


Science Ink: control its seizures. 
Tattoos of the 

Science Obsessed One Loom reader sent 
CARL ZIMMER in a sequence of zeroes 
Sterling: 2011.288 pp. and ones — the name 
$24.95, £16.99 of his daughter Lain in 


binary code. Some of 
the tattoos are simple line drawings. Many 
are colourful and stunningly detailed — such 
as the elaborate picture on a mathematician’s 
back ofa microscope and the usually hidden 
world it reveals. 

These decorated scientists join a tradition 
that is both venerable and near universal. 
The earliest evidence for tattooing dates back 
to Neolithic Eurasia. From there it probably 
spread from the Middle East to the Pacific 


M. QUAST 


Islands, and later to the Americas, by way of 
India, China and Japan. By 3,000 years ago 
it was found almost everywhere, and today 
remains rare only in sub-Saharan Africa. 

As permanent body art, similar to scarifi- 
cation, tattoos typically marked permanent 
or semi-permanent aspects of social posi- 
tion, such as rank or marital status. Today, 
they still serve this purpose, among others. 
As I wrote in Bodies of Inscription (Duke 
University Press, 2005), when the middle 
classes began getting tattoos, they also began 
to create “tattoo narratives”: stories relating 
why they got the tattoo, how long they had 
thought about it, the genesis of the design 
and its meaning, the tattooing experience 
and what the tattoo means to them now. 

For professionals, these narratives are par- 
ticularly important. As trailblazers in their 
class, they need to create new meanings for 
their tattoos; underworld or working-class 
narratives are not relevant to them. New 
narratives are important for personal as well 
as social and ‘tribal’ reasons — the scientists 
don’t want their choices to seem random or 
impulsive. 

Many of the scientists’ designs are not 
easily understandable without knowing 
the story behind them. For example, the 
tattoo that inspired Zimmer's quest was a 
double helix acquired by one of his friends, 
a neurobiologist. But it isn’t just any DNA: it 
also spells out the name of that friend’s wife. 
Another couple featured in the book have 
matching tattoos of chromosomes splitting 
during meiosis; those with no basic under- 
standing of biology would have a hard time 
grasping the literal or metaphorical meaning 
of their squiggles without a narrative. 

Science Ink is packed with fascinating 
stories. One of the most moving is Abigail's. 
A chemistry student, she sent in a photo of 
her tattoo — the word ‘entropy’ inked on her 
back. A few months later, her mother sent 
Zimmer a note saying that Abigail had died 
ina car accident and that she was getting 
her daughter’s tattoo replicated on her own 
body. That blog post and the comments it 
generated became a memorial for Abigail, 
and eventually led to a posting by a woman 
whose mother had received Abigail's lungs 
after her death. 

We call tattoos permanent, but they last 
only as long as the body that wears them 
survives. Abigail's tattoo has a life beyond 
her own: the design now adorns the head- 
stone marking her grave. And it is there in 
the pages of Science Ink — one of many signs 
of an enduring fervour for science, and a new 
chapter in the age-old history of body art. m 


Margo DeMello is a cultural anthropologist 
and author of Bodies of Inscription: A 
Cultural History of the Modern Tattoo 
Community. 

e-mail: margo@animalsandsociety.org 
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Books in brief 


Missing Links: In Search of Human Origins 

John Reader OXFORD UNIVERSITY PRESS 350 pp. £25 (2011) 

The cast of ancient superstars in palaeoanthropologist John Reader’s 
book has grown significantly in the 30 years since the first edition. 
Neanderthal Man, Lucy and other early hominin fossils are joined by 
finds from Homo floresiensis to Ardipithecus in a stunningly illustrated 
update. Powered by enthusiasm and peppered with controversy, the 
search for human origins is laid out clearly and succinctly, from the 
first fossils and Victorian revelations, to frauds such as Piltdown Man 
and triumphs such as the ‘world’s oldest child’: the Australopithecus 
afarensis fossil unearthed in Ethiopia and called Selam (‘peace’). 


American Madness: The Rise and Fall of Dementia Praecox 
Richard Noll HARVARD UNIVERSITY PRESS 390 pp. £33.95 (2011) 
Between 1895 and the 1930s, tens of thousands of Americans 
were diagnosed with dementia praecox — an ‘incurable’ psychosis 
described by German psychiatrist Emil Kraepelin. The diagnoses 
then petered out. Psychologist Richard Noll traces the trajectory 

of this near-forgotten disorder, showing how it became the first 
specified disease of psychiatry, legitimizing that field’s place in 
medicine. Noll also shows how the debates today around the 
successor to dementia praecox, schizophrenia, are leading to a trend 
in psychiatry towards diagnoses that could fit better with genetics. 


Galileo’s Muse: Renaissance Mathematics and the Arts 

Mark A. Peterson HARVARD UNIVERSITY PRESS 336 pp. £21.95 (2011) 
The great scientist Galileo Galilei was also a gifted draftsman and 
accomplished musician, steeped in Renaissance poetry. But art 
was no side interest for Galileo, physicist Mark Peterson claims. The 
mathematical inspiration for his findings, such as four of Jupiter’s 
moons, was fished from the humanist stream then flowing so 
powerfully in Italy. So it was Dante’s Inferno, Filippo Brunelleschi’s 
great domes and artist-innovators from Piero della Francesca 

to Leonardo da Vinci, not the medieval tag ends of science, that 
inspired Galileo and ignited the Enlightenment, Peterson argues. 


What Doesn’t Kill Us: The New Psychology of Posttraumatic Growth 
Stephen Joseph BASIC Books 288 pp. $26.99 (2011) 

Tsunamis, assault, near-death accidents: such experiences are 
popularly imagined to scar victims ‘for life’ and leave them in thrall 
to post-traumatic stress disorder. After two decades of research, 
positive psychologist Stephen Joseph argues that, for many, these 
traumas can become an “engine for transformation”. Backed by 
case studies, he covers trauma’s emotional toll, the underlying 
biology, the realities of resilience and the array of therapies on offer, 
such as trauma-focused cognitive behaviour therapy. This is a 
thorough and common-sense look at the psychology of survival. 


The Unconquered: In Search of the Amazon’s Last 

Uncontacted Tribes 

Scott Wallace CROWN 512 pp. $26 (2011) 

Conquering civilizations have ebbed and flowed through Latin 
America, but uncontacted tribes such as the flecheiros (or Arrow 
People) still survive deep in the Amazon rainforest. Now their 

home and culture are threatened by deforestation, epidemics and 
marginalization. Journalist Scott Wallace takes us on a journey 
through a warzone where irreplaceable habitats and the knowledge 
of traditional peoples are the casualties. 
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BIOTECHNOLOGY 


DNA dollars 


Linnaea Ostroff examines a history of Genentech, the 
US company that first made biology a business. 


s the mysteries and mechanics of 
A™ were being revealed, it was 

unclear whether the molecule would 
be used for good or evil. Debates raged: 
utopian fantasies of ending disease and 
famine competed with fears of mutated life 
forms running amok. A suitably startling, 
if less popcorn-worthy, event occurred in 
October 1980, when the promise of DNA 
modification raised US$35 million in a 
landmark initial public offering (IPO), 
which saw the fastest stock-price rise in the 
market’s history. 

The record-breaking IPO was that of 
Genentech, a small company based in San 
Francisco, California, whose plan was to 
produce drugs using recombinant DNA 
technology. This was the first commercial 
manipulation of DNA and the first sale of 
biological science as a commodity in its 
own right. The biotech industry was born. 
Genentech’s unique corporate structure, 
which blurred the boundary between aca- 
demia and industry, was swiftly imitated. The 
sometimes uncomfortable entanglement of 
publicly funded basic research with private 
business enterprise persists to this day. 

Genentech by science historian Sally 
Smith Hughes gives a detailed account of 
the founding and early years of the com- 
pany. Much of the 


material in the “Although 

book comes from Genentech’s 

oral histories col- business 

lected by Hughes, model was 

along with written groundbreaking, 
archival material. thelong-term 
Hughes’s book is strategy was a 


not, however, a 
journalistic analy- 
sis of a unique and important company: it 
is an account of the key players, as told to a 
sincere admirer. 

Nevertheless, Genentech’s achieve- 
ments in science, medicine and business 
were momentous. One of the company’s 
co-founders, Herbert Boyer, a molecular 
biologist at the University of California, 
San Francisco, was at the time a leader in 
the development of recombinant DNA 
technology. Boyer and others had recently 
discovered a means of reorganizing (recom- 
bining) the sequence of DNA molecules, and 
were pursuing a method to use this engi- 
neered DNA to generate proteins. This had 


classic risk.” 


profound implications 
for drug production 
and development. 
Whereas most 
drugs had been dis- 
covered by large-scale 
screening of synthetic 
chemicals, a hand- 
ful, such as insulin, 
were natural proteins 
whose production in 


y 


Genentech: The 


the body was impaired Beginnings of 
ceeds : Biotech 

in diseases such as dia- sy/ iy sity HUGHES 
betes. Proteins have University of Chicago 
exceptionally com- Press: 2011. 232 pp. 
plex structures, and $25,£16 


it is still too difficult 

to routinely synthesize them from scratch. 
Therapeutic proteins were at the time 
sourced from animals’ organs and human 
cadavers, making their supply and safety 
unreliable. In theory, recombinant DNA 
could provide a safe, consistent source of 
this class of therapeutics. 

Boyer’s group was working on a way to 
coax bacterial cells to produce therapeutic 
proteins from recombinant DNA. More 
importantly, recombinant DNA presented 
a means of designing drugs using the bio- 
logical mechanisms of a particular disease, 
which seemed to be an obvious advance 
over the pharmaceutical industry’s random 
screening procedures. Hughes does not, 
however, touch on any of this, leaving the 
reader to wonder why recombinant DNA is 
viewed as so useful. 

The reasons the IPO was so successful, 
and why that success was so shocking, are 
also underdeveloped in the book. At the 
time it went public, Genentech had the 
intention of making pharmaceuticals but 
had no actual drugs in the pipeline. What 
it did have was a contract with Eli Lilly, the 
largest producer of synthetic insulin. The 
contract was the first of its kind: Eli Lilly 
was not paying Genentech to produce insu- 
lin, nor licensing a method to do so, but was 
paying it to do the basic scientific research 
needed to develop a method. Never before 
had an independent group of scientists 

contracted with a for- 
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company offered research as its sole source 
of revenue. 

A patent on recombinant DNA tech- 
niques was granted in 1980 to Stanford Uni- 
versity, California, and to the University of 
California, where Boyer and his colleagues 
had developed the technology. The assur- 
ance of intellectual-property protection for 
genetic-engineering methods and products 
encouraged the explosion of the biotechnol- 
ogy sector, as academic researchers began to 
independently commercialize their findings. 
The now commonplace practice of scientists 
maintaining ties to both universities and 
their own associated companies, along with 
the conflicts it creates, comes directly from 
Genentech’ initial arrangement. 

Although Genentech’s business model was 
groundbreaking in its mechanics, the long- 
term strategy was a classic risk. Genentech’s 
insulin was intended to be the Gutenberg 
Bible of recombinant DNA technology — an 
established product made in a new way with 
a guaranteed market. Yet the route between 
basic knowledge of a disease process and an 
effective therapy is punishing, and many 
subsequent designer drugs generated using 
the method proved not to be viable. 

Rational drug design has not overtaken 
traditional drug-discovery approaches, and 
biotechnology development is shifting back 
to large pharmaceutical companies, which 
can hedge risk internally — although the 
future of drug discovery is a legitimate con- 
cern. Genentech itself is now wholly owned 
by Swiss pharmaceutical giant Roche. 

The scant objectivity, the somewhat 
plodding chronology of unfolding events 
and the sparse explanations of technical 
terminology in Hughes’s account aside, 
Genentech’s story remains a compelling 
one. It neatly reveals the divergent chal- 
lenges of basic science, medical science 
and business, and despite its novelty, the 
tale illustrates several enduring principles 
of science and markets. 

In shifting genetic-engineering research 
from academia to industry, Genentech 
and the industry it founded accelerated 
the development and distribution of medi- 
cally and agriculturally valuable products. It 
triggered practical decisions on policy and 
regulation, while effectively sidestepping 
philosophical and ethical questions about 
the uses of DNA: the market would decide 
what DNA should be used for. Genentech’s 
business model shunted private money 
directly into basic research, drew inves- 
tors into basic science and academic sci- 
entists into business. Even as the industry 
reorganizes, these relationships remain. m 


Linnaea Ostroff is a researcher at the 
Center for Neural Science, New York 
University, New York 10003, USA. 
e-mail: lostroff@nyu.edu 


L. A. CICERO/STANFORD NEWS SERVICE 


Q&A Persi Diaconis 
The mathemagician 


Mathematician Persi Diaconis of Stanford University in California ran away from home in his 
teens to perform card tricks. As he publishes a book on the mathematics of magic, co-authored 
with juggler and fellow mathematician Ron Graham, he explains what makes a good trick. 


Which came first for you, magic or maths? 

Magic came first. When I was five, I found 
a magic book in the attic and started doing 
shows. I was a terrible magician but the 
other kids liked it. In high school I had a 
good geometry teacher but had no interest 
in mathematics and never did homework. 
When I was 13, I met Alex Elmsley, a soft- 
spoken British computer engineer and 
magician, at a magic shop in New York City. 
He showed me that eight perfect shuffles 
would put a deck back in its original order. 
His ingenious method for moving the top 
card to a desired position within the deck 
was my introduction to binary numbers. 


How did you pursue magic? 

The sleight-of-hand artist Dai Vernon invited 
me to Delaware to do some magic shows 
when I was 14 — and I never went back home. 
We found crooked gamblers to learn their 
techniques. There are often tricky probability 
questions involved. I didn't have much proba- 
bilistic intuition and made every boneheaded 
mistake you could make. During ten years on 
the road, Ilearned the hard way. 


Can you describe one of the best tricks you 
learned? 

In 1916 the pioneering US magician Charles 
Thornton Jordan advertised a trick called 


Magical ‘Long-Distance Mind 
Mathematics: — Reading’. He would 
The Mathematical mail you an ordinary 
Ideas thatAnimate deck of cards, ask 
Great Magic Tricks t end tyak 
PERSIDIACONISAND =‘ YOU fo cut and shur- 
RON GRAHAM fle twice, then draw 


a card and restore 
it to the middle of 
the deck. After you 
shuffled again and 
returned the deck to him, he would name 
your card. It is a wonderful trick that fooled 
everybody and didn't seem mathematical. 
It works because the deck is arranged in a 
special order. When you shuffle once, the 
deck is split into two alternating sequences 
(or ‘chains ). Two shuffles makes four inter- 
locking chains, and three shuffles makes 
eight chains. When he got the deck back, he 
would play a sort of solitaire to isolate the 
one card that was not in any of these chains. 
That was the chosen card. 


Princeton University 
Press: 2011. 258 pp. 
$29.95, £20.95 


How did you come to study shuffling theory? 
Jordan’s trick led me to ask how many times 
you have to shuffle a deck of cards to mix 
them up properly. People often ask why 
this problem can't be solved by brute force 
with a computer. But a deck of cards can be 
arranged in almost as many ways as there 
are atoms in the Universe. All the computers 
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in the world couldn't run through all the 
arrangements. We used probability theory, 
combinatorics and group theory to prove 
that you need seven ordinary riffle shuffles 
to mix up a deck randomly. Now we're using 
the mathematics of shuffling to study tur- 
bulence in fluid dynamics, which has many 
industrial applications, such as determin- 
ing how long a vat of cookie dough must be 
blended to ensure that all of the ingredients 
are mixed. 


Are there any other practical applications of 
card tricks? 

Jordan invented a method of ordering a deck 
such that the pattern of reds and blacks in a 
series would code for a unique set of cards, 
allowing him to divine cards on very little 
information. Such an arrangement is known 
as a De Bruijn sequence among mathemati- 
cians. These sequences have lots of practical 
applications. They are used to scramble 
mobile-phone signals, reassemble snippets 
of DNA and allow a digital pen to identify 
its position on special paper. 


Could science explain what makes a good 
magic trick? 

The psychology of deception is a serious 
subject, particularly among spy agencies, 
but I have never seen a convincing study of 
it. Some magic tricks are viscerally moving 
and shocking, others are painfully boring. 
Dai Vernon said that good magic has a way 
of “ingeniously leading the mind to defeat its 
own logic”. That sort of thing is not so scien- 
tific. Magic is a theatrical experience. 


How are secrets treated among magicians? 
It is a strange tension. In magic there are 
still many secrets. I'm famous for keeping 
them. If someone shows me something, 
it stays with me forever. When a student 
asked me recently to talk about card tricks, 
I declined. But Wikipedia and YouTube 
are changing things. Someone can find the 
magician’s secret on their phone during a 
performance. Maybe this will make magi- 
cians more inventive, or make people more 
appreciative. 


What kind of maths do you prefer? 

I enjoy learning new things. When you start 
in a new field you have to ask dumb ques- 
tions. I often say I’m paid for my ability to 
tolerate feeling stupid. I also like problems 
that touch the real world, and that you can 
explain simply, like flipping a coin or spin- 
ning a roulette wheel. When I develop a big 
piece of theory, I feel like ’'m slumming: the 
real discoveries are in the examples. This 
spring, I will teach a course in the mathemat- 
ics of magic tricks, in an effort to twist young 
minds in the right direction. = 


INTERVIEW BY JASCHA HOFFMAN 
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Correspondence 


Iraq’s unseen burden 
of wounded civilians 


The burden of war-related mental 
disorders is well documented 
among US veterans (Nature 477, 
390-393; 2011), but not among 
civilians in Iraq. This oversight 
must be rectified so that adequate 
medical support can be provided 
to the Iraqi people. 

US combat troops will soon 
depart Iraq, leaving Iraqis to 
cope with the consequences of 
the 2003 invasion. Although 
the number of violent deaths 
is falling, civilians have been 
killed almost every day this year, 
most of them in coordinated 
bomb attacks. Roadside blasts 
cause long-term disabilities 
and societal effects among 
injured civilians. However, 
these have been largely 
neglected by the media and no 
systematic surveillance has been 
undertaken. 

Despite Iraq’s damaged health- 
care system, primary repairs of 
many injuries are being carried 
out with acceptable results. But 
some patients require advanced 
procedures that are not available 
in the country. For international 
aid organizations that are unable 
to work safely inside Iraq, one 
operational model is to treat 
patients outside the country. 

Médecins Sans Frontiéres 
(MSF; also known as Doctors 
Without Borders) started a 
surgical programme in Amman, 
Jordan, in 2006 to provide 
functional reconstructive surgery 
and psychosocial support for 
Iraqi civilians. Although MSF has 
managed progressively to increase 
the capacity of its programme, 
it can still accommodate only a 
small fraction of the Iraqi civilians 
in desperate need of advanced 
surgical care. 

Development of standardized 
data-collection tools would 
greatly improve future 
monitoring of mental health and 
of explosion injuries. Culturally 
adequate interventions and 
tailored support networks are 
needed to alleviate the long- 
term physical and psychological 


repercussions of exposure to 
war-related trauma. Above all, 
efforts must concentrate on 
improving security for civilians. 
Gilles Guerrier, Emmanuel 
Baron Epicentre, Paris, France. 
guerriergilles@gmail.com 
Rasheed Fakri Médecins Sans 
Frontiéres, Amman, Jordan. 
Isabelle Mouniaman Médecins 
Sans Frontiéres, Paris, France. 


Brazil’s forest code 
puts wetlands at risk 


Brazil's revisions to its Forest 
Code threaten not only the 
Amazon rainforest but also its 
wetlands (Nature 476, 259-260; 
2011). Many Brazilian flood 
plains extend into neighbouring 
countries, so they could also be 
affected. 

Seasonal rainfall causes the 
levels of most Brazilian rivers 
to fluctuate. Flood plains reach 
widths of tens of metres along 
small streams and tens of 
kilometres along large rivers, and 
up to 90% of these dry up during 
periods of low rainfall. 

These wetlands provide 
the environment and humans 
with important services, such 
as water storage, discharge 
buffering, water clearing, 
sediment retention, recharging 
of the groundwater level, local 
and regional climate regulation, 
and maintenance ofa large 
biodiversity. Some provide 
homes and livelihoods for 
traditional human populations 
as they harbour important fish 
stocks and can also be managed 
for low-density cattle ranching 
and timber production. 

Neither the old nor the new 
version of the Forest Code 
specifically mentions wetlands. 
The old code protects forests 
along streams and rivers, 
according to the river’s width and 
maximum water level, thereby 
integrating and protecting the 
wetland areas. The new code 
protects areas only to a poorly 
defined “regular” water level, 
opening up opportunities for 
the destruction of high-lying 
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wetland areas. This will damage 
the integrity of the remaining 
low-lying areas, along with most 
of their benefits for humans and 
the environment. 

The Intergovernmental Panel 
on Climate Change predicts 
that large parts of Brazil will 
experience heavier rainy 
seasons and more severe dry 
periods interspersed with heavy 
rainfall. Buffering by intact 
wetlands will be increasingly 
important as water availability 
and distribution become the 
limiting factors for agricultural 
development and the well-being 
of rural and urban populations. 
Paulo Teixeira de Sousa Jr 
National Institute for Science and 
Technology in Wetlands (INAU), 
Cuiabd, Mato Grosso, Brazil. 
pauloteixeiradesousa@gmail.com 
Maria Teresa Fernandez 
Piedade National Institute of 
Amazonian Research (INPA), 
Manaus, Amazonas, Brazil. 
Ennio Candotti Museum of 
Amazénia, Manaus, Amazonas, 
Brazil. 


Small colleges aided 
by research networks 


As faculty members of primarily 
undergraduate institutions 
(PUIs), we have successfully 
developed research programmes 
with our students despite funding 
and collaborator limitations 
(Nature 477, 239-241; 2011). 
Long-term research at PUIs is 
challenging, but feasible. 

To improve our research 
productivity, we have set up 
collaborative research networks 
with other PUIs. Through our 
Ecological Research as Education 
Network (EREN, comprising 
72 PUIs), we are helping 
each other to develop grant 
proposals, research protocols, 
data sets and manuscripts, and 
are teaching students through 
multi-institutional, collaborative 
research (D. R. Bowne et al. 
BioScience 61, 386-392; 2011). 

Internal research funding, 
including start-up packages, 
varies widely among institutions. 


Ina survey of 50 ecology 
faculty members from PUIs 

at the Ecological Society of 
America’s 2011 annual meeting, 
30% of respondents reported 
no internal support for research 
equipment and supplies, and 
51% said there was none for 
student-researcher stipends. 
Inter-institutional networks 
enable faculty members to 
share research resources at 
minimal cost. 

PUI faculty members are 
creative in seeking research 
funding. For example, the same 
survey revealed that 94 funding 
sources had been successfully 
accessed, including US 
government agencies. 

PUI faculty members often 
collect long-term data during 
undergraduate courses and in 
independent research, although 
better coordination is needed 
to enhance the scientific and 
educational impact of this 
work. Members of EREN have 
shared research protocols with 
many institutions to answer 
continental-scale questions. 
Erin S. Lindquist Meredith 
College, Raleigh, North Carolina, 
USA. erinlind@meredith.edu 
Laurel J. Anderson Ohio 
Wesleyan University, Delaware, 
Ohio, USA. 

Jeffrey A. Simmons Mount St. 
Mary’ University, Emmitsburg, 
Maryland, USA. 


Aboriginal people 
agreed to DNA study 


As research manager of the 
Goldfields Land and Sea Council 
(GLSC), I was involved in your 
discussion of Aboriginal genome 
research (Nature 477, 522-523; 
2011) and would like to make it 
clear that the decision to allow 
analysis of the 90-year-old hair 
sample was made by the duly 
mandated people. The decision 
took proper account of ethical 
research practices and of the 
rights of Aboriginal people to 
safeguard their cultural heritage. 
The GLSC is the representative 
body for the Aboriginal people 


in the region where the sample 
was obtained, and is recognized 
under the Native Title Act 1993. 
The directors are elected by GLSC 
members, and membership is 
open to all Aboriginal residents 
of the region. In granting their 
permission for the research, 

the board exercised properly 
defined moral, cultural and legal 
authority to speak on behalf of the 
Aboriginal people there. 

Most research — be it 
sociological, historical or 
genetic, or even political polling 
— extrapolates from a sample to 
draw conclusions. Participants 
are rarely expected to seek 
consent from their entire group 
before giving up information. 

Because the hair sample was 
almost certainly given to British 
ethnologist Alfred Cort Haddon 
voluntarily in the early 1920s, 
this example of an informal 
exchange between an Aboriginal 
person and a researcher does 
not provide a model for all such 
exchanges in the future. These 
should be underpinned by a 


NATURE’S 
READERS 
COMMENT 
ONLINE 


Selected responses 

to ‘Fund people not 
projects’ by John P. A. 
Ioannidis (Nature 
477, 529-531; 2011). 


standard indicating that free, 
prior and informed consent was 
sought from the proper people. 
Craig Muller Goldfields Land 
and Sea Council, Perth, Western 
Australia, Australia. 
craig.muller@glc.com.au 


Give more priority to 
phosphorus studies 


Iagree with James Elser and 
Elena Bennett that we should 
recycle phosphorus (Nature 478, 
29-31; 2011). However, there are 
situations in which the natural 
recycling of phosphorus is not 
ecologically desirable. 

As the authors note, excess 
phosphorus in water bodies can 
feed algal blooms and create 
anoxic zones. What is less well 
known is that these waters can 
become permanent dead zones, 
stuck in an oxygen-deprived, 
nutrient-rich state. This happens 
when the algae die, sink and 
are decomposed by anaerobic 
bacteria that need only limited 


Yiding Zhao says: 

‘Fund people not projects’ 
was once the model used in 
China, but major international 
journals frowned on it because 
it risked creating Xue ba 
(scientific autocracy that 
suppresses others’ ideas). So 
we worked hard to adapt the 
grant-based model. Now you 
are telling us the grant-based 
model is worse? 
yidingzhao@pku.edu.cn 


Ken Whitmire says: 

Away to fund people rather than 
projects would be to allocate 
money directly to individual 
graduate students and postdocs 
through fellowships, instead of 
funnelling it through a principal 
investigator's grant. This 
redirection wouldn’t cost the 
system any more money, and it 
would make it clear to non- 
scientists that fellowships are 
funding the training of a highly 
skilled technical workforce, 

as well as helping a research 
enterprise. Students would have 
more independence in choosing 
an adviser and advisers would 


CORRESPONDENCE Meu) 


amounts of phosphorus. Most of 
the algal phosphorus is released 
back into the water to feed 
further blooms. In the Baltic 

Sea, for example, reductions 

in phosphorus pollution 

from rivers have not yet led to 
ecosystem recovery because of 
this effect. 

We have disturbingly little 
insight into major phosphorus 
fluxes in the marine realm. 
This is the legacy of decades of 
research priority being given 
to the microbial complexities 
of the nitrogen cycle over the 
methodologically challenging 
investigation of phosphorus 
cycling. 

Caroline P. Slomp Utrecht 
University, the Netherlands. 
c.p.slomp@uu.nl 


Boost resilience to 
tackle mental illness 


An economically efficient 
way of tackling the enormous 
social and economic costs of 


be under less pressure to raise 
huge sums of money to support 
an active research group. 
whitmir@rice.edu 


Sander Heinsalu says: 
Funding models have trade- 
offs. Detailed checks create 
bureaucracy, but avoid misuse 
of money. Specific goals limit 
creativity, but avoid funding 
less useful projects. | would like 
to see scientific evidence on 
which funding models generate 
better output — although the 
best definition of output is also 
debatable. 
sanderheinsalu@hot.ee 


Adrian Barnett says: 

One option is to fund projects 
retrospectively, with money 
being handed out for work 
delivered (including papers, 
policy changes, improvements 
in health), rather than for 
promises made in grant 
applications. Most current 
grant systems are heavily 
biased towards senior staff, 
but this scheme would work 
irrespective of applicants’ 


mental ill health (Nature 477, 
132 and 478, 15; 2011) would 
be to boost ‘resilience’ to mood 
disorders. 

Mentally healthy individuals 
often show a positive affective 
bias because their processing 
of negative information 
is inhibited. This effect, 
possibly mediated by the 
neuromodulator serotonin, 
promotes resilience by 
dampening the stress 
associated with negative life 
experiences. 

Devising ways to promote 
such resilience in healthy 
individuals could help to 
prevent chronic stress-related 
brain disorders, saving huge 
amounts of money and 
heartache every year. 

Oliver J. Robinson National 
Institute of Mental Health, 
Bethesda, Maryland, USA. 
robinsonoj@mail.nih.gov 


Disclaimer: Views presented in this 
Correspondence are solely those of the 
author and do not necessarily represent 
the views of the US federal government. 


status. The process would 

be less burdensome for 
researchers because it would 
involve gathering their existing 
evidence and costs. 
a.barnett@qut.edu.au 


Craig Macfarlane says: 

The funding model is largely 
irrelevant — what really 
matters is the amount of 
money. Whatever the system 
is, scientists are smart enough 
to learn to play it, and it will 
be dominated by established 
players who are closest to its 
centre. The only solutions are 
for developed countries to 
increase public funding for 
research to ensure that it is 
not just the heavyweights who 
receive grants, and to support 
more public-good research. 
Anything else is just fiddling 
at the margins. Until that 
happens (when hell freezes 
over perhaps), get used to 
things the way they are. 
fisheye@iinet.net.au 


To join this debate, go to 
http://go.nature.com/ciahh5. 
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OBITUARY 


Ralph Steinman 


(1943-2011) 


Immunologist and cheerleader for dendritic-cell biology. 


alph Steinman changed the world 
R: immunology when he discovered 

dendritic cells, but it took the field a 
long time to recognize the importance of 
his discovery. The idea that a new type of 
immune cell could be found in 1973 — in 
the era of molecular cell biology — simply 
by looking down a microscope seemed 
far-fetched. The early criticism was relent- 
less. Steinman’s road to the Nobel Prize in 
Physiology or Medicine — awarded (unu- 
sually) just days after his death — was full 
of obstacles. That the journey was even 
possible was down to his forceful person- 
ality, energy and focus. 

Steinman was born in 1943 in Sherbrook, 
Quebec, Canada, as the second son ina 
family of Jewish immigrants originating 
in Moldavia and Poland. The Steinmans 
owned Mozart’s, a general store selling 
everything from appliances to clothing. 
His parents wanted him to study religion 
and take over the family business, but sum- 
mers working in the store reinforced Stein- 
man’s desire to do something else. His love 
of science led him to McGill University in 
Montreal, Canada, then to Harvard Medi- 
cal School in Boston, Massachusetts, where 
in the late 1960s he heard lectures by Kurt 
Bloch on the initiation of immunity and 
studied Peter Medawar’s work on toler- 
ance and Frank Mcfarlane Burnet’s ideas 
on clonal selection. 

Although Steinman did a residency in 
internal medicine 


at Massachusetts “As a basic 
General Hospital Scientist he 

in Boston, he was recognized 
drawn to basic the enormous 
research. In 1970, challenge 

he joined Zanvil of taking a 

Cohn and James discovery from 
Hirsch’s labora- _thelaboratory to 


tory at Rockefeller 
University in New 
York to work on the initiation of immune 
responses. Cohn and Hirsch were focus- 
ing on macrophages, but Steinman was 
also influenced by his campus neighbours: 
Christian de Duve, George Palade, Philip 
Siekevitz, David Sabatini and Giinter Blobel 
were inventing modern cell biology a few 
floors above. Steinman soon characterized 
pathways for the engulfing of molecules by 
cells (endocytosis) and, with Cohn, pro- 
posed the involvement of membrane recy- 
cling in this process. 


the patient.” 


At that time, in the early 1970s, immu- 
nologists were developing culture systems to 
help study the cellular basis of immunity. An 
early finding was that, in addition to B and T 
lymphocytes, antibody responses required 
another type of cell, dubbed an ‘accessory 
cell. The mysterious accessory cells stuck to 
glass, so Steinman — inspired by his cell- 
biology colleagues’ emphasis on microscopy 
— decided to examine glass-adherent spleen 
cells using phase contrast, live imaging and 
electron-microscopy techniques. 

What he saw under the lens was a new 
type of immune cell that had branching, 
rapidly changing projections. Three experi- 
ments (on two of which, one of us, MCN, 
was involved) then convinced Steinman 
that these ‘dendritic cells were the missing 
accessory cells: they could induce T-cell 
division and initiate killer-T-cell responses 
to antigens, and were biochemically distinct 
from macrophages. 

For decades, Steinman was the consensus 
leader of, and the most enthusiastic cheer- 
leader for, the field of dendritic-cell biol- 
ogy. He brought in numerous scientists 
from other fields, and loved to collabo- 
rate. With Wesley van Voorhis, he showed 
that dendritic cells exist in the blood of 
humans. With Kayo Inaba, he established, 
among other things, that when loaded 
with antigen, dendritic cells could induce 
anti-tumour immunity in mice. With 
Gerald Schuler, he saw that dendritic cells 
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could be activated by pathogens to initiate 
immunity. 

Steinman was generous and much loved 
by his colleagues. Even when he was ter- 
ribly ill, he spent the little time he had left 
ensuring that his students and fellows 
would land on their feet after he passed. 

He was passionate about his work as 
editor of the Journal of Experimental Medi- 
cine, a responsibility that he enjoyed for 
more than 40 years. His focus on publish- 
ing outstanding science had a profound 
influence on his field. 

And he was equally passionate about 
making the leap from the bench to the 
bedside. In recent years, he tried to use 
dendritic cells to develop vaccines. As a 
basic scientist he recognized the enormous 
challenge of taking a discovery from the 
laboratory to the patient. But he relished 
the task because of its importance in 
treating infectious diseases and cancer. 
Steinman’s passion continued even when 
it was him in the bed, receiving dendritic- 
cell therapy of his own design. m 


Michel C. Nussenzweig is a Howard Hughes 
Medical Institute investigator at Rockefeller 
University. Ira Mellman is at Genentech and 
the University of California, San Francisco. 
Both trained with Steinman early in their 
scientific careers. 

e-mails: nussen@mail.rockefeller.edu, 
mellman.ira@gene.com 
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Figure 1 | Cold clouds. Manney et al.’ report that Arctic stratospheric temperatures were low enough for polar stratospheric clouds, such as the ones shown 
here over Swedish Lapland, to form through much of the winter of 2010-11. These clouds have a crucial role in polar ozone destruction. 


ATMOSPHERIC SCIENCE | 


An Arctic ozone hole? 


The observation of unusually low ozone levels over the Arctic last winter provides reassuring evidence that our knowledge 
of stratospheric chemistry is robust. Whether such an episode will happen again is an open question. SEE ARTICLE P.469 


ROLANDO R. GARCIA 


the ozone column in the atmosphere above 

the Arctic dropped to minimum values 
of 220-230 Dobson units (DU). Although 
these values are much higher than the lowest 
values seen routinely since the 1990s for the 
Antarctic spring (about 100 DU), they are 
nonetheless among the lowest ozone-column 
figures ever observed in the Arctic’. These 
values were recorded for just one week in late 
March, but minimum column amounts of 
less than 250 DU were observed for almost a 
month. Reporting on page 469 of this issue, 
Manney et al.’ argue that the very low ozone 
observed over the Arctic in March constitutes 
the first ever ‘ozone hole in the Northern 
Hemisphere. 

The authors support their argument by 
showing that the chemistry of the Arctic 
stratosphere in the spring of 2011 was remark- 
ably similar to that commonly observed in 
Antarctica, which routinely leads to the devel- 
opment of the well-known Antarctic ozone 
hole. In particular, the abundances of nitric 
acid (HNO,) and hydrochloric acid (HCI) were 
exceptionally low in March, and close to values 
usually seen only in Antarctica (see Fig. 2 of 
the paper’, which shows observations of these 


IE late March of this year, the thickness of 


and other species, including ozone, made by 
the Microwave Limb Sounder Instrument 
on NASA’ss Aura satellite). At the same time, 
the abundance of chlorine monoxide (CIO), 
which is an efficient catalyst of ozone destruc- 
tion, rose throughout the Arctic winter and 
remained at record high levels during March 
— levels that are comparable to those seen in 
Antarctica in September, the equivalent point 
in the seasonal cycle. 

The high levels of catalytic chlorine- 
containing species such as ClO led to rapid 
ozone loss over the Arctic from mid-February 
through into March. In mid-to-late March, 
the ozone abundance at altitudes near 20 kilo- 
metres dropped to about 1.5 parts per million 
by volume (p.p.m.v.). This is in the upper range 
of values observed in Antarctica in Septem- 
ber, but well below typical values for the Arctic. 
By the end of March, the abundance of ozone 
was 1-1.5 p.p.m.v. at altitudes between 15 and 
20 km; again, this resembles what happens 
during the Antarctic spring, and is much lower 
than normal for the Arctic. 

Manney et al. show that the unusual evolu- 
tion of atmospheric chemistry observed in the 
Arctic was made possible by the occurrence 
of persistently low stratospheric temperatures 
from December 2010 through to March 2011. 
There have been other exceptionally cold 
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Arctic winters in the recent past: 1995-96, 
1999-2000 and 2004-05 were as cold as last 
winter from December through to February, 
and 1996-97 was just as cold during March 
and April. But only in 2010-11 were very cold 
conditions sustained continuously for a period 
of four months. Minimum temperatures 20 km 
above the Arctic remained below 195 K from 
mid-December 2010 through to the end of 
March 2011. This value is crucial, because it is 
the threshold for the formation of polar strato- 
spheric clouds (PSCs; Fig. 1). PSC particles 
provide sites for heterogeneous (gas-solid and 
gas-liquid) reactions’ that ‘activate’ chlorine 
by freeing it from the reservoir species HCl 
and chlorine nitrate (CIONO,). Indeed, wide- 
spread formation of PSCs was documented 
by Manney and colleagues’. This is consist- 
ent with their observations of high levels of 
ClO and low levels of HNO, and HCl, because 
heterogeneous reactions on PSCs deplete 
levels of HCl and ‘denitrify’ the stratosphere 
by removing HNO, from the gas phase. 

Did ozone depletion in 2011 constitute a true 
Arctic ozone hole? As unusual as this year was, 
ozone loss did not approach the magnitude of 
depletion seen in Antarctica, whether meas- 
ured locally or in terms of the total ozone col- 
umn. In Antarctica, ozone virtually disappears 
at altitudes between 12 and 24 km, and in some 


D. H. JONES/SPL 


years the depth of the ozone column is reduced 
to 100 DU or even less’. Such severe depletion 
was not observed in the Arctic. Judged by these 
criteria, there was no Arctic ozone hole in 2011 
— only the most extreme episode of ozone 
loss seen in the Arctic so far. However, the 
evolution of HNO,, HCl and ClO species was 
strikingly Antarctic-like and different from 
what has been observed in other Arctic 
winters. On the basis of these considerations, 
Manney et al. conclude, reasonably enough, 
that there was an Arctic ozone hole in 2011. 

Was the ozone loss seen in 2011 a truly 
exceptional event, or should we expect repeated 
episodes of extensive Arctic ozone depletion as 
a result of climate change? Although increas- 
ing concentrations of carbon dioxide in the 
atmosphere warm the troposphere, they cool 
the overlying stratosphere’, and observations” 
have shown that the global stratosphere has 
cooled significantly during the past 30 years. 
Furthermore, there is evidence to suggest that 
the coldest Arctic winters are becoming colder® 
— although there is no statistically significant 
trend because Arctic winter temperatures are 
highly variable. Comprehensive models of 
the climate system do not produce consistent 
predictions for the evolution of Arctic temper- 
atures and ozone loss in the twenty-first cen- 
tury’; the refinement of such predictions will 
undoubtedly be an important topic of research 
in the immediate future. Another considera- 
tion is that the abundance of ozone-destroying 
chlorine- and bromine-containing species in 
the atmosphere is decreasing following the 
adoption of the Montreal Protocol, which bans 
substances that deplete the ozone layer®, such 
that the impact of cold winters on stratospheric 
ozone will lessen steadily in the future. 

All things considered, a repeat of this year’s 
episode is possible, but not likely. Nonetheless, 
the behaviour of Arctic ozone in 2011 is a nice 
demonstration that our basic understanding 
of stratospheric chemistry is robust, and that 
extensive ozone loss will occur wherever the 
right conditions are present. = 


Rolando R. Garcia is in the Atmospheric 
Chemistry Division, National Center for 
Atmospheric Research, PO Box 3000, Boulder, 
Colorado 80307-3000, USA. 

e-mail: rgarcia@ucar.edu 
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Suicide of a protein 


Plants and fungi follow a complex route to make the vitamin thiamine for 
carbohydrate metabolism. One of the pathway’s protein participants turns out to 
be asurprising player, sacrificing its own activity in the process. SEE LETTER P.542 


PETER ROACH 


hiamine is an essential cofactor that 

allows important enzymes to catalyse 

metabolic reactions. Humans depend 
on their diet to supply it as vitamin B,, but 
bacteria, plants and yeast can make their own. 
They do this by bringing together two pre- 
cursor molecules: a sulphur-containing ring 
structure known as a thiazole and a nitrogen- 
ous pyrimidine. But the biochemical origin of 
the thiazole component of thiamine in plants 
and fungi has been mysterious — until now. 
On page 542 of this issue, Chatterjee and 
co-workers’ describe a highly unusual reaction 
step in which a sulphur atom is transferred 
from a protein’s amino-acid residue, to become 
part of the thiazole precursor. The donor pro- 
tein is therefore functioning as a reagent, or 
‘suicide enzyme, rather than as a conventional 
catalytic enzyme. 

The biosynthesis of the precursor thiazole 
starts with the metabolic cofactor molecule 
nictotinamide adenine dinucleotide (NAD); 
in the yeast Saccharomyces cerevisiae, the only 
protein required for the biosynthetic pathway 
is known as THI4p. The co-purification with 


THI4p of compounds such as 3 (an adenylated 
carboxythiazole molecule; Fig. 1), which 
contains the same carbon skeleton as NAD, 
provided the first clues to NAD’s function in 
this context”. Chatterjee et al.' found that one 
amino-acid residue — the cysteine at position 
205 (Cys 205) of the THI4p sequence — loses 
a sulphur atom from its side chain during the 
biosynthesis, and so is converted to a dehydro- 
alanine (Dha) residue. They concluded that the 
protein was probably providing the sulphur 
atom for thiazole biosynthesis. 

Experimental evidence for a protein's role in 
a particular metabolic step is usually provided 
by reconstitution of the biosynthetic activity 
in vitro using the purified protein and a well- 
defined set of chemical ingredients. This was 
not easy to demonstrate for THI4p, because 
by the time it had been purified it had already 
lost the sulphur atom from Cys 205 — presum- 
ably as a result of undergoing reaction in the 
growing cells before purification. Chatterjee 
et al. got around the problem by culturing 
the cells on a growth medium containing the 
absolute minimum of nutrients: these cells 
yielded intact and unmodified THI4p pro- 
tein. A paucity of reaction intermediates in 
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Figure 1 | Incorporation of sulphur into a thiamine precursor. The biosynthesis of the cofactor 
thiamine requires an adenylated carboxythiazole molecule (3) as a building block. Chatterjee et al.' report 
that, in yeast, the source of the sulphur atom in 3 is the cysteine amino-acid residue at position 205 of 

the protein THI4p. In the presence of iron(11) (Fe**), Cys 205 reacts with precursor molecule 1 to afford 
intermediate compound 2. The Cys 205 residue is converted into a dehydroalanine (Dha) residue in the 
process, thereby inactivating THI4p. Intermediate 2 then loses two molecules of water to produce 3. This 
use of a protein as a co-substrate for a biosynthesis, rather than as a catalytic enzyme, is highly unusual. 


ADP is adenosyl diphosphate, a nucleotide group. 
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the medium after cell growth indicated that a 
component required for THI4p activity might 
be missing. However, addition of iron(11) 
(Fe**) to the minimal- growth medium mark- 
edly increased the formation of compound 3 
(Fig. 1). The reaction is extremely selective 
for Fe**, so the authors concluded that Fe** 
must be necessary for the sulphur-transfer 
reaction. 

The thiazole-forming reaction could now 
be reconstituted in vitro from NAD using 
intact THI4p protein. The reaction showed 
conversion of NAD through to compound 1 
(Fig. 1), and addition of Fe** completed the 
transformation to the final thiazole product, 3. 
Chatterjee et al. found that each molecule of 
THI4p accounted for the production of a 
single molecule of thiazole, observing a clear 
1:1 correlation between the loss of sulphur 
from THI4p and formation of the thiazole 
product. 

The deployment ofa protein as such a meta- 
bolic reagent is extremely rare in biology. One 
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of the few well-characterized examples is the 
protein Ada, which repairs methylated lesions 
in DNA’. Reactivation of THI4p containing 
the Dha residue has so far proved impos- 
sible. Experiments with cultured cells show 
that thiamine and THI4p accumulate in a 1:1 
ratio during growth, consistent with THI4p 
functioning as a ‘one-shot’ reagent and not 
being recycled by the cellular machinery. The 
reaction results in the build-up of inactivated 
(Dha 205) THI4p until it comprises about 1.5% 
of total cellular protein’. 

These observations’ raise interesting ques- 
tions, such as why a protein should be used as 
a sulphur donor and why the inactive protein 
should not undergo rapid degradation and its 
amino acids be recycled. As Chatterjee et al. 
note, this accumulation of THI4p implies 
that the inactive protein has another cellular 
function. Indeed, inactivated THI4p has been 
implicated in DNA protection and other stress 
responses’ ’. The biochemical mechanism 
linking the biosynthetic and stress-response 


Eris under scrutiny 


Astellar occultation by the dwarf planet Eris provides a new estimate of its size. 
It also reveals a surprisingly bright planetary surface, which could indicate the 
relatively recent condensation of a putative atmosphere. SEE LETTER P.493 


AMANDA GULBIS 


most distant planet in our Solar System. 
In 1992, the discovery’ of another body 
in roughly the same region proved that Pluto 
was not alone and launched a new frontier of 
study in planetary science — the Kuiper belt, 
the region at and beyond Neptune's orbit in 
which the two bodies reside. More than 1,000 
Kuiper-belt objects have since been discov- 
ered. These objects consist mainly of ices and 
are typically located away from Earth at 30-50 
times the distance between Earth and the Sun. 
One particular object, later named Eris, caught 
the attention of astronomers because initial 
size estimates based on proposed values for its 
surface reflectance showed that it was probably 
larger than Pluto’. Subsequent direct imag- 
ing suggested’ that Eris was about 5% larger 
than Pluto, whereas detection of Eris’s ther- 
mal radiation indicated’ that the planet was 
substantially larger than Pluto. On page 493 
of this issue, Sicardy et al.° use the powerful 
technique of stellar occultation to derive the 
most detailed information to date on Eris’s size 
and other physical traits. 
The discovery’ of Eris in the Kuiper belt 
sparked the realization that more planets in 
the outer reaches of the Solar System were 


Pp luto has long reigned as the smallest and 


probably awaiting detection. A great debate 
ensued as to whether Eris and other large, yet- 
to-be-detected Kuiper-belt objects — even 
Pluto itself — should be considered planets. 
This debate forced the astronomical commu- 
nity to rethink the definition of a planet. As 
a result, the definition was changed in 2006 
and both Pluto and Eris were reclassified as 
dwarf planets. Rarely in the course of modern 
research has an astronomical discovery gen- 
erated such widespread debate and emotional 
reaction among both scientists and the general 
public — it turns out that Eris was aptly named 
after the Greek goddess of strife and discord. 
The stellar-occultation technique, in which 
a star is observed to pass behind a foreground 
object (in this case, Eris), has proven to be an 
effective method for discovering and charac- 
terizing features of Solar System bodies. Onlya 
handful of large Kuiper-belt objects have been 
observed to occult stars, and each occultation 
has revealed something new and interesting, 
such as an unexpectedly bright surface on 
Kuiper-belt object 55636 (ref. 6), or waves in 
the upper atmosphere of Pluto”®. These results 
have led to reconsideration of ideas about how 
objects in the outer Solar System are formed 
and evolve. Sicardy et al.° predicted and 
observed the occurrence ofa stellar occulta- 
tion when Eris was nearly 100 times as distant 
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roles of THI4p is as yet unknown, but one 
possibility posed by Chatterjee et al. is that the 
protein helps to mop up any excess intracellu- 
lar Fe’*, thereby preventing chemical reactions 
that can produce damaging reactive oxygen 
species. m 
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from Earth as Earth is from the Sun. Eris is 
the most distant Solar System object to be suc- 
cessfully observed by this technique, a notable 
achievement. 

The authors find that Eris has a radius of 
1,163+6 kilometres, which is smaller and has 
significantly less margin of error than the pre- 
vious measurements of 1,200+100 km (ref. 3) 
and 1,500+200 km (ref. 4). Ironically, Sicardy 
and colleagues’ results cannot definitively 
state whether Eris is larger than Pluto. The 
ambiguity is due to the fact that Pluto's atmos- 
phere prevents accurate measurement of its 
surface location, rather than being due to any 
deficiency in the Eris observations. A more 
interesting result from the authors’ study’ is 
the possibility that Eris has a collapsed atmos- 
phere (frozen to the surface), or a localized 
atmosphere under certain conditions. 

Sicardy et al. demonstrate that Eris has no 
atmosphere at present and that a surprisingly 
large amount of light is reflected from its sur- 
face. This unusually bright surface is difficult 
to reconcile with the idea that objects in the 
outer Solar System become darkened by cos- 
mic rays and micrometeorite impacts over 
time. The new observations could thus sup- 
port along-standing theory that, as a large, icy, 
Kuiper-belt object approaches the Sun during 
its orbit, a putative atmosphere could sublimate 
and then condense out as the object moves 
farther away. Eris is currently far away from 
the Sun in its 557-year orbit, and although 
the results do not prove that an atmosphere 
ever existed, the bright surface could indicate 
relatively recent condensation. 

The discovery of Eris and other Kuiper-belt 
objects allows Pluto, a seemingly unique object 
for so many years, to be placed in a broader 
perspective. Investigation of the bulk proper- 
ties of a large number of Kuiper-belt objects 


has provided insight into the formation, 
evolution and dynamical histories of these 
bodies. We now expect that there are hundreds 
of objects that will eventually be classified 
as dwarf planets. Because Eris is the only 
body similar to Pluto for which detailed 
stellar-occultation data are available, Sicardy 
and colleagues’ results’ represent a major 
step forward in our knowledge about large 
Kuiper-belt objects. 

Future attempts will certainly be made to 


obtain additional stellar-occultation data for 
Eris as well as for other Kuiper-belt objects. 
Whether they are called planets or not, there 
is clearly still much to learn about these distant, 
icy bodies. m 
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Plague’s progress 


The Black Death was one of the most devastating pandemics in human history. 
The first complete genome sequence of the causative Yersinia pestis bacterium 
provides a fresh perspective on plague evolution. SEE LETTER P.506 


EDWARD C. HOLMES 


r he Black Death was a pandemic of almost 
unprecedented scale. It is estimated 
that 30-50% of Europe’s population 

perished from the plague between 1347 and 
1351. As might be expected from such a 
remarkable level of mortality, this pestilence 
had a profound impact on medieval society. 
For example, it greatly influenced the depic- 
tion of death in art (Fig. 1), and acted as the 
driving force for the establishment of some 
of the first public-health measures, although 
these were generally futile. It has long been 
supposed that the Black Death, like all plague 
epidemics since, was caused by the bacterium 
Yersinia pestis in a transmission cycle involving 
fleas, rats and humans. But the evolutionary 
relationships between Y. pestis strains from 
different plague epidemics have been less 
clear. On page 506 of this issue, Bos et al.’ 
describe the first complete genome sequence 
of Y. pestis from Black Death victims, and 
show that this pandemic was a pivotal event in 
plague evolution. 

Obtaining bacterial DNA that is almost 
700 years old presents a number of challenges, 
prominent among them the risk of inadvertent 
contamination by DNA from other sources’. 
Bos et al. extracted bacterial DNA from five 
teeth of plague victims taken from a burial pit 
at East Smithfield (ES) in London, which was 
established at the height of the pandemic in 
1348-49. The key methodological advance 
in their work was the use of a molecular cap- 
ture assay that assisted in the detection of 
Y. pestis DNA amid a background of host and 
environmental DNA. The stringent laboratory 
procedures used by the authors, the observed 
patterns of mutational damage in the DNA, 
and the finding that the ES strain is ances- 
tral to all contemporary Y. pestis strains on 


Figure 1 | The last rites. In this fourteenth-century 
picture, a victim of the Black Death is attacked by 

a devil, while a priest reads the last rites and God 
watches from above. 


phylogenetic trees, strongly suggest that the 
obtained genome sequence is an authentic 
representative of the Black Death pathogen. 
The ES strain is significant for several rea- 
sons. First, in combination with studies of par- 
tial genome sequences of Y. pestis from Black 
Death victims’, it demonstrates beyond doubt 
that Y. pestis was the true cause of the Black 
Death, despite claims to the contrary*. Second, 
the fact that the ES strain falls at the base of 
a phylogenetic tree that links contemporary 
Y. pestis genomes suggests that the Black Death 
was a crucial event in plague evolution that 
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generated all later lineages of Y. pestis, includ- 
ing those responsible for the ‘Asian’ (or mod- 
ern) plague pandemic that has spread globally 
since the nineteenth century. Interestingly, the 
greatest genetic diversity of Y. pestis is observed? 
in China, suggesting that most plague epidem- 
ics originated in this region. Third, Bos et al} 
suggest that the genetic similarity between the 
ES strain and contemporary strains of Y. pestis 
that are associated with less severe epidemics 
indicates that the high mortality of the Black 
Death was not simply a function of the bacterial 
strain involved. Given the small sample size of 
the authors’ study, this assertion must be treated 
with caution, but the availability of a complete 
genome of Y. pestis from the Black Death should 
make the hypothesis experimentally testable. 
Although the recovery and sequencing of the 
ES strain confirms the role of Y. pestis in the 
Black Death, it also raises questions about the 
cause of two earlier major disease pandemics 
previously assigned to Y. pestis: one that spread 
through parts of Africa, Asia and Europe in 
AD 541-542, during the reign of the Roman 
emperor Justinian and, more tentatively, the 
plague of Athens (430-426 Bc), which was 
evocatively described in the writings of the 
Greek historian Thucydides. Bacterial DNA has 
purportedly been recovered from both of these 
epidemics, although in the case of the plague 
of Athens, the DNA was attributed’ to the Sal- 
monella enterica serovar Typhi bacterium that 
causes typhoid fever, rather than to Y. pestis. 
However, the Athenian Salmonella strain is not 
actually closely related to that responsible for 
typhoid’, suggesting that the ancient DNA had 
been contaminated by DNA from a modern, 
soil-living Salmonella species. The cause of the 
plague of Athens therefore remains a mystery. 
The DNA supposedly from the Justinian 
plague is certainly that of Y. pestis, but the 
close similarity between the Justinian. DNA 
and that of modern Y. pestis variants® suggests 
that the former strain may not be authentic. 
Importantly, if all contemporary strains 
of Y. pestis are derived from the Black Death, 
as suggested by Bos and colleagues’, then 
both of the earlier epidemics were caused 
either by a strain of Y. pestis that has left 
no contemporary descendants, or by an 
entirely different organism. Ancient DNA 
may be central to resolving this question. 
The analysis of ancient DNA is a powerful 
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tool, providing a unique perspective on the 
timescale and rates of microbial evolution, 
and of the host adaptation of microbial popu- 
lations. Ancient DNA also provides a means 
to determine the cause of ancient epidemics, 
particularly when historical descriptions of 
disease symptoms yield ambiguous diagnoses. 
However, the spectre of contamination has 
loomed large, so that the impact of ancient- 
DNA analysis on our understanding of the evo- 
lution of infectious diseases has been relatively 
minor. Up until now, the most important study 
of historical pathogens has been the sequenc- 
ing’ of the virus responsible for the devastat- 
ing influenza pandemic of 1918-19, which has 
provided insight into viral pathogenesis'®. The 
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ES strain of Y. pestis is one of the oldest patho- 
gens sequenced so far, and the results are some 
of the most credible in the field’s history. Bos 
and colleagues’ study’ certainly sheds light on 
one of the most significant events in human 
history, but its greatest significance may be that it 
heralds a new era of research into the genomics 
of ancient bacterial diseases. m 


Edward C. Holmes is at the Center for 
Infectious Disease Dynamics, Department 
of Biology, Pennsylvania State University, 
University Park, Pennsylvania 16802, 

USA, and the Fogarty International Center, 
Bethesda, Maryland. 

e-mail: echolmes@psu.edu 


Shedding light on the 
fabric of space-time 


The idea that space-time might be fundamentally fuzzy is much debated among 
theorists. A search for signatures of this effect on light from distant cosmic 
sources has come up empty-handed, but shows the potential of this approach. 


GIOVANNI AMELINO-CAMELIA 


ne of the most fascinating journeys in 

science concerns the evolution of the 

description of space and time. Exam- 
ples of milestones on this journey are the reali- 
zation that time is relative and that space-time 
bends in response to the presence of heavy 
particles and bodies’. Only one feature of the 
initial naive conceptualization of space 
and time by Isaac Newton has survived: 
in current theories, space-time is still 
viewed as a smooth entity. However, 
even this last Newtonian pillar is now 
being scrutinized. The main prediction 
of the theory of quantum mechanics, 
which underlies many of the advances of 
fundamental physics in the twentieth 
century, is that the results of a large class 
of measurements are affected by irre- 
ducible uncertainties. And all attempts 
to apply this successful theory to the 
description of space-time suggest that, 
as a result of some of these uncertainties, 
space-time should be fundamentally 
fuzzy. Writing in Astronomy & Astrophys- 
ics, Tamburini et al.” describe how they 
have searched for an imprint of space- 
time fuzziness on light from sources 
located at large distances from Earth. 

Ina fuzzy space-time, particles would 
travel much like a car on a bumpy road, 
with the notion of a smooth trajectory 
emerging only if observations cannot 
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resolve the effects of the bumps (Fig. 1). Unfor- 
tunately, the magnitude of the ‘bumps’ of space- 
time — the scale of space-time fuzziness — is 
horrifyingly small. This scale is expected to be 
of the order of the minuscule Planck length, 
Ly which is about 10°*° metres and is defined 
as L, = (hG/ c’)*, where G is Newton’s constant 
of gravity, h is the reduced Planck constant of 
quantum mechanics and c is the speed of light. 
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Theoretical studies of space-time fuzziness 
have been an active area of research since at 
least the 1960s, when John Wheeler pointed 
out’ the significance of this feature and coined 
the catchy term ‘space-time foam’ to describe 
it. But for a long time there was no experi- 
mental counterpart to these studies, because 
it seemed that ‘bumps’ with a magnitude of 
10-*° metres should forever be beyond the 
reach of the sensitivity of any experimental 
apparatus. Moreover, devising detailed candi- 
date models of space-time fuzziness is hard, 
because essentially it involves combining the 
complexity of quantum mechanics with the 
intricacies of general relativity’. 

As a result, phenomenological approaches 
to describing space-time foam started to gain 
momentum only quite recently. A first wave 
of studies” attempted to ‘parameterize our 
ignorance’ about space-time foam and to 
compensate for the smallness of the Planck 
length by studying the effects of fuzziness for 
particles travelling across distances of a few 
kilometres on Earth. But then, with 
a second wave of investigations’, it 
became clear that this strategy could be 
generalized to the case of observations 
of light from certain classes of distant 
astrophysical sources. And it is this 
astrophysics version of the approach 
to testing space-time fuzziness that 
Tamburini et al.’ adopt in their study. 

The authors’ analysis exploits the fact 
that, as a result of space-time fuzziness, 
the wavefront of a light wave from a dis- 
tant source is expected to develop tiny 
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Figure 1 | Bumpy journeys. The distance travelled by free 
particles in space increases with time, but the exact form of the 
increase depends on the nature of the fabric of space-time. Ina 
smooth space-time, the increase is exactly linear, whereas in a 
‘bumpy’ space-time the dependence is only roughly so. Shown here, 
qualitatively, are three particle journeys (blue, red and green) ina 
bumpy space-time. The effects of space-time bumps are invisibly 
small for short distances, but become visible for larger distances. 
Tamburini et al.” looked for evidence of a bumpy space-time on 
light from distant cosmic sources. (Figure courtesy of N. Loret.) 
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corrugations as it travels, such that some 
portions of the wavefront are advanced 
while others are retarded. For propaga- 
tion over large cosmological distances 
from Earth, this wavefront corrugation 
can add up to give a macroscopic effect. 
Tamburini et al. searched for this effect 
in images of around 160 distant quasars 
(extremely luminous galactic nuclei 
powered by supermassive black holes), 
all at redshift greater than 4. Their analy- 
sis benefits from both the large number 


of quasars and a calibration technique that is 
based on images of relatively nearby stars, for 
which the effects of space-time fuzziness are 
expected to be negligible. 

The result of Tamburini and colleagues’ 
analysis is negative: the authors found no evi- 
dence for space-time fuzziness. But this could 
be the start of an exciting season for space-time 
research. Their study convincingly establishes 
that, by adopting techniques based on wave- 
front corrugation, several parameterizations 
of space-time fuzziness occurring at the Planck 
length can already be ruled out and others could 
be tested in the near future. This was inconceiv- 
able in the days of Wheeler's proposal, and only 
a speculative hope even a few years ago™®. 

At this point, theories of space-time fuzzi- 
ness might be lagging behind experiments. It 
is widely appreciated that the multi-parameter 
models adopted in studies such as that of 
Tamburini et al. are rather crude. These mod- 
els should be good enough for preliminary 
estimates but are not sufficiently refined to 
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provide detailed guidance for experimentalists 
looking for a manifestation of space-time foam. 
In particular, there is no compelling math- 
ematical description for the behaviour of light 
waves in the presence of fuzziness. So far, the 
descriptions rely on taking as a starting point 
Maxwell’s wave equation and then introduc- 
ing heuristically some plausible implications 
of fuzziness. This is partly why the authors 
ofa study” similar to that of Tamburini et al., 
while also concluding that this type of analysis 
can allow detection of fuzziness at the Planck 
scale, legitimately argue for slightly different 
estimates of the bounds presently established 
on some parameterizations of fuzziness. 

The development of refined models 
of space-time foam does not look much 
easier now than it did in the 1960s. But having 
established the possibility of probing fuzzi- 
ness at the Planck length experimentally, 
there will be plenty of motivation for facing 
the challenges that theory presents. If experi- 
mentalists are guided by refined descriptions 


More than a bystander 


The tendency of hydrophobic surfaces to aggregate in water is often invoked to 
explain how biomolecules recognize and bind to each other. Water seems to have 
a much more active role in these processes than had been thought. 


PHILIP BALL 


hen biomolecules interact, what do 

the surrounding water molecules 

do? One might think that their job 
would simply be to get out of the way, a crowd 
that must stand aside for the main actors. But 
there is now good reason to believe that water 
has a much more active role in the dialogue 
between the more celebrated constituents 
of the cell. When a protein binds its ligand, 
associates with another protein or folds into 
its functional form, the surrounding solvent 
acts as a versatile intermediary and facilita- 
tor. Three papers’ * uncover the subtlety and 
sophistication of that role and, in doing so, 
challenge some common perceptions of how 
biomolecular recognition operates. 

One of the key concepts in the interactions 
of biomolecules is the hydrophobic effect, 
which loosely characterizes the tendency of 
hydrophobic particles and surfaces to aggre- 
gate in aqueous environments*”. Proteins 
typically bury their hydrophobic amino-acid 
residues in their interior as they fold; and 
hydrophobic groups on ligands are generally 
juxtaposed to similar surfaces in an enzyme’s 
binding site. Proteins themselves associate 
into larger aggregates — as functional assem- 
blies, for example, or as fibrillar misfolded 


structures in neurodegenerative diseases — 
by marrying up their hydrophobic surfaces. 
Yet there is still no consensus on how these 
hydrophobic interactions operate. 

The traditional picture, now decades 
old, invokes an enhanced ordering of water 
molecules around the hydrophobe that 
preserves the hydrogen bonding of water 
molecules in the bulk®. In this view, the 
coming together of hydrophobes expels the 
intervening ‘ordered’ water into the bulk 
phase, an entropically favourable process. 
Although still routinely used, this picture 
receives no real support from experiments 
probing local hydration structures’. 

Ithas been argued’ that there are, in fact, two 
distinct size regimes in which hydrophobic 
effects work. Small hydrophobic particles can 
be accommodated in water without affecting 
hydrogen-bonding networks, whereas larger 
ones unavoidably break hydrogen bonds, gen- 
erating a ‘soft’ interface at the surface of the 
particle that is analogous to that between liquid 
and vapour. These extended surfaces then stick 
together because of a de-wetting transition: at 
a certain separation of particles, the water is 
collectively expelled from between them. The 
crossover between these two pictures is pre- 
dicted to be at size scales of about 1 nanometre 
— precisely the scale that is relevant to the 
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of space-time foam, they might find its signa- 
ture in the not-so-distant future. But for now, 
the last pillar of the Newtonian description of 
space-time still stands. = 
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association and folding of proteins. Evidence 
for de-wetting in the aggregation of protein 
subunits has, however, been conflicting”. 

In Proceedings of the National Academy of 
Sciences, Li and Walker! now report that such 
a length-scale crossover exists, albeit one that 
decreases from around 11.4 angstroms at 48 °C 
to 3.5 A at 150 °C. They have used the tip of an 
atomic-force microscope to unravel a collapsed 
hydrophobic polymer in such a way that mono- 
mers buried in the collapsed mass become 
exposed to water one at a time, whereupon 
they become hydrated by water molecules. The 
authors found that, for polymers in which the 
monomers have large hydrophobic side-chains 
(about 1 nm long), there is a maximum value 
for the free energy of monomer solvation at 
which the entropy of hydration changes from 
being positive to negative — in other words, 
this is the crossover predicted previously’. 

One objection to the de-wetting picture was 
that it was likely to be too difficult to nucleate a 
‘dry’ region between hydrophobic surfaces — 
that is, to create a vapour cavity in the water. 
But a recently published model” of solvation 
suggests that part of the surface of melittin, 
a peptide that can apparently aggregate by 
de-wetting"', is sufficiently hydrophobic to 
permit spontaneous cavitation. Others have 
suggested’*”* that de-wetting draws on the 
intrinsic fluctuations of water density at the 
water-hydrophobe interface, and another 
recent study” has shown that biomolecules 
may tune these fluctuations so that they sit 
close to a de-wetting transition. This tendency 
of biological systems to bring themselves in 
proximity to a phase transition, and thereby to 
enable sensitive and pronounced responses 
to small changes in the environment, is 
probably generic’. 

Also in Proceedings of the National Academy 
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Figure 1 | Water motion in enzyme-substrate binding. Grossman et al.° 
report that the movements of water molecules associated with the binding 

site of a zinc metalloprotease enzyme change substantially during binding of 
a substrate to the enzyme. a, In this computer simulation of the free enzyme, 
the enzyme surface is grey and the zinc ion in the active site is yellow. Water 
molecules are shown as spheres, the colours of which indicate the mobility of 
the molecules, based on the timescale at which the hydrogen-bonded network 
of molecules rearranges: red indicates relatively free motion, as found in 

bulk water; cyan indicates strongly retarded motion; and colours in between 


of Sciences, Snyder et al.” show that the whole 
discourse of the hydrophobic effect, at least in 
ligand binding, has too long been dominated 
by the notion that there is a single explana- 
tion involving the expulsion of water from 
the binding cleft. As with any process, the 
free-energy change associated with ligand 
binding contains an entropic and an enthalpic 
contribution (enthalpy is a measure of the total 
energy of the system). During binding, water 
molecules constrained inside a cleft might be 
released, thereby boosting the free energy of 
binding by increasing entropy. But the enthal- 
pic contribution to that change is by no means 
obvious, and could potentially counteract any 
entropic gain. 

Snyder and colleagues’ results show that it is 
probably unwise to make any generalizations 
about these thermodynamic contributions. 
They have characterized the binding between 
a rigid enzyme and a series of structurally 
related substrates. Some of these ligands 
contained groups that increase their hydro- 
phobic contact area with the binding cavity, 
which has a hydrophobic and a hydrophilic 
side. The authors strikingly conclude that the 
alleged hydrophobic effect is rather insensitive 
to this contact area. Instead, it seems to arise 
primarily from structural changes in the net- 
work of water molecules between the ligand 
and the hydrophilic side of the cavity. Thus, 
although at a broad level the hydrophobic 
effect does involve differences in the structure 
of water close to solute surfaces relative to the 
structure of bulk water, the detailed balance of 
entropy and enthalpy is likely to vary on a case- 
by-case basis, and can be understood only by 
this kind of detailed analysis. Moreover, Snyder 
et al.” point out that “the shape of the water in 
the binding cavity may be as important as the 
shape of the cavity”. 


Although all this makes for a far more 
complicated picture of biomolecular bind- 
ing than the classic geometrical ‘lock and key’ 
model, it is still predicated on a static or quasi- 
equilibrium picture. But that, too, is incomplete, 
according to Grossman and colleagues’ report 
in Nature Structural and Molecular Biology”. 
They have used a combination of spectroscopy 
techniques, coupled to molecular-dynamics 
simulations, to follow changes in water and 
protein dynamics as a zinc metalloprotease 
enzyme binds its substrate. The results offer 
perhaps the most astonishing picture of how 
finely biomolecules manipulate their associated 
water molecules to perform their function. 

The authors’ find that, as enzyme-substrate 
binding develops, but before a full complex is 
formed, the movement of water near the pro- 
tein is retarded (Fig. 1). Crudely put, it is as if 
the water ‘thickens’ towards a more glassy form, 
which in turn calms the fluctuations of the sub- 
strate so that it can become locked securely in 
place. It is not yet clear what causes this solvent 
slowdown as a precursor to binding; indeed, 
the whole question of cause and effect is com- 
plicated by the close coupling of protein and 
water motion and will be tricky to disentangle. 
In any event, molecular recognition here is 
much more than a case of complementar- 
ity between receptor and substrate — it also 
crucially involves the solvent. This suggests that 
changes in protein and solvent dynamics are 
not mere epiphenomena, but have a vital role 
in substrate binding and recognition: they are 
more cause than consequence. 

As well as offering a fresh view of bio- 
molecular shape and function, these findings’ * 
pose a daunting yet stirring challenge. Given 
that most drugs are ligands that bind to bio- 
logical targets, will it be possible to make the 
fine-tuning of water structure and dynamics 
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Retarded 


represent intermediate levels of motion. A steep gradient of water motion is 
observed from outside to inside the active site. b, In the early stage of binding, 
a substrate (white) is bound nonspecifically to the surface of the enzyme. 

The substrate has its own cohort of hydrating water molecules. c, Once the 
substrate is specifically bound to the zinc ion, the gradient of water motion 
around the site is far less steep than in a. The motion of the water molecules 
solvating the substrate is also slowed down compared with b. Grossman et al. 
propose that the change in water dynamics assists the binding of the substrate 
to the active site. (Taken from Fig. 5b of the paper*.) 


an element of drug design? Indeed, can we 
hope to compete systematically with natural 
recognition processes at drug targets unless 
that mastery is attained? m 


Philip Ball is a writer based in London. 
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Chemical ozone destruction occurs over both polar regions in local winter-spring. In the Antarctic, essentially complete 
removal of lower-stratospheric ozone currently results in an ozone hole every year, whereas in the Arctic, ozone loss is 
highly variable and has until now been much more limited. Here we demonstrate that chemical ozone destruction over 
the Arctic in early 2011 was—for the first time in the observational record—comparable to that in the Antarctic ozone 
hole. Unusually long-lasting cold conditions in the Arctic lower stratosphere led to persistent enhancement in 
ozone-destroying forms of chlorine and to unprecedented ozone loss, which exceeded 80 per cent over 18-20 
kilometres altitude. Our results show that Arctic ozone holes are possible even with temperatures much milder than 
those in the Antarctic. We cannot at present predict when such severe Arctic ozone depletion may be matched or 


exceeded. 


Since the emergence of the Antarctic ‘ozone hole’ in the 1980s' and 
elucidation of the chemical mechanisms” and meteorological con- 
ditions® involved in its formation, the likelihood of extreme ozone 
depletion over the Arctic has been debated. Similar processes are at 
work in the polar lower stratosphere in both hemispheres, but differ- 
ences in the evolution of the winter polar vortex and associated polar 
temperatures have in the past led to vastly disparate degrees of spring- 
time ozone destruction in the Arctic and Antarctic. We show that 
chemical ozone loss in spring 2011 far exceeded any previously 
observed over the Arctic. For the first time, sufficient loss occurred 
to reasonably be described as an Arctic ozone hole. 


Arctic polar processing in 2010-11 


In the winter polar lower stratosphere, low temperatures induce 
condensation of water vapour and nitric acid (HNOs) into polar 
stratospheric clouds (PSCs). PSCs and other cold aerosols provide 
surfaces for heterogeneous conversion of chlorine from longer-lived 
reservoir species, such as chlorine nitrate (CIONO 2) and hydrogen 
chloride (HC1), into reactive (ozone-destroying) forms, with chlorine 
monoxide (ClO) predominant in daylight’. 

In the Antarctic, enhanced ClO is usually present for 4-5 months 
(through to the end of September)*”’, leading to destruction of most 
of the ozone in the polar vortex between ~14 and 20km altitude’. 
Although ClO enhancement comparable to that in the Antarctic 
occurs at some times and altitudes in most Arctic winters’, it rarely 
persists for more than 2-3 months, even in the coldest years". Thus 
chemical ozone loss in the Arctic has until now been limited, with 
largest previous losses observed in 2005, 2000 and 1996”"7-"*. 

The 2010-11 Arctic winter-spring was characterized by an 
anomalously strong stratospheric polar vortex and an atypically long 
continuously cold period. In February-March 2011, the barrier to 


transport at the Arctic vortex edge was the strongest in either hemi- 
sphere in the last ~30 years (Fig. 1a, Supplementary Discussion). 

The persistence of a strong, cold vortex from December through to 
the end of March was unprecedented. In the previous years with most 
ozone loss, temperatures (T) rose above the threshold associated with 
chlorine activation (T,.., near 196K, roughly the threshold for the 
potential existence of PSCs) by early March (Fig. 1b, Supplementary 
Figs 1, 2). Only in 2011 and 1997 have Arctic temperatures below Ty 
persisted through to the end of March, sporadically approaching a 
vortex volume fraction similar in size to that in some Antarctic winters 
(Fig. 1b). In 1996-97, however, the cold volume remained very limited 
until mid-January and was smaller than that in 2011 at most times 
during late January through to the end of March (Fig. 1b, Supplemen- 
tary Figs 1, 2). 

Daily minimum temperatures in the 2010-11 Arctic winter were 
not unusually low, but the persistently cold region was remarkably 
deep (Supplementary Figs 1, 2). Temperatures were below T,, for 
more than 100 days over an altitude range of ~15-23 km, compared 
to a similarly prolonged cold period over only ~20-23 km altitude in 
1997; below ~19 km altitude, T< T,., continued for ~30 days longer 
in 2011 than in 1997 (Supplementary Fig. 1b). In 2005, the previous 
year with largest Arctic ozone loss’, T< T,, occurred for more than 
100 days over ~17-23 km altitude, but all before early March. 

The winter mean volume of air in which PSCs may form (that is, 
with T’< Tact), Vpsc: is closely correlated with the potential for ozone 
loss”'*"'7, In 2011, Vpse (as a fraction of the vortex volume) was the 
largest on record (Fig. 1c). Both large V,- and cold lingering well into 
spring are important in producing severe chemical loss”'>’*, and 
2010-11 was the only Arctic winter during which both conditions 
have been met. Much lower fractional V,,. in 1997 than in 1996, 2000, 


psc 
16,17 


2005 or 2011 (Fig. 1c) is consistent with less ozone loss that year 
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Figure 1 | Meteorology of the Arctic lower stratosphere. a, Vortex strength 
(as indicated by maximum potential vorticity” (PV) gradients) at 460 K 
potential temperature (~18 km altitude, ~65 hPa level). b, Fraction of vortex 
volume at potential temperatures between 390 and 550 K with a temperature 
less than the chlorine activation threshold (T,-:). Light (dark) grey shading 
shows range of Arctic (Antarctic) values for 1979-2010. Antarctic dates are 
shifted by six months (top axis in a) to show the equivalent season. c, Winter 
mean V,,.- during the past 32 years, expressed as a fraction of vortex volume. 
Red, orange, green, purple and blue lines/bars show the 2010-11, 2004-05, 
1999-2000, 1996-97 and 1995-96 Arctic winters, respectively. 


Factors playing secondary parts in governing interannual variability 
in ozone destruction, including vortex strength, structure and posi- 
tion relative to the cold region, also favour large loss in 2011 (Sup- 
plementary Figs 2, 3, Supplementary Discussion). However, despite 
the fraction of the vortex with T < T,., and mid-March temperatures 
sporadically approaching those seen in the Antarctic (Fig. 1b, 
Supplementary Fig. 1a), even in 2011 temperatures were much higher, 
and the cold regions much smaller, than those in most Antarctic 
winters. 

Satellite trace-gas and PSC measurements highlight the stark con- 
trast between polar processing in 2010-11 and that in typical Arctic 
winters, and the parallels with Antarctic conditions (Figs 2, 3). In 2011, 
PSCs or aerosols were abundant until mid-March (Fig. 3a; consistent 
with a deep region with T < T,.., Fig. 3b), much later than usual in the 
Arctic'* °°, with vortex-average amounts at some altitudes similar to 
those in the Antarctic and dramatically larger than the near-zero values 
at that time in most Arctic winters. Furthermore, PSCs in 2011 
spanned an altitude range comparable to that in the Antarctic, an 
uncommon occurrence in the Arctic'**°. Particles in long-lasting 
PSCs can grow large enough to sediment, resulting in denitrification, 
permanent removal of HNO; from the stratosphere””*. By late March 
2011 no PSCs remained (Fig. 3a), yet HNO3 mixing ratios were much 
lower than observed in any previous Arctic winter (Fig. 2a). The con- 
tinuing depression in HNO; after PSCs had evaporated indicates 
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denitrification. Albeit less severe than in typical Antarctic winters 
(Fig. 2b, c, 3c), the extent and degree of denitrification in 2011 were 
unmatched in the Arctic, approaching the range of Antarctic condi- 
tions for the first time. 

Decreasing HCl and increasing ClO signify chlorine activation 
(Fig. 2d-i). Some ClO enhancement has occurred in all recent 
Arctic winters, but has never been as prolonged and extensive as that 
in 2011. In late February, high ClO pervaded the sunlit portion of the 
vortex. The 2011 values vastly exceed the range previously observed in 
the Arctic from late February through to the end of March. They also 
briefly lie outside the Antarctic seasonal envelope, primarily because 
the higher solar zenith angles of the Antarctic measurements used 
here lead to ~30% lower ClO under fully activated conditions. In late 
February, HCl values (unaffected by solar zenith angle issues) fall 
along the lower boundary of the Antarctic envelope, confirming the 
picture seen in ClO. The vertical extent of chlorine activation was also 
comparable to that in the Antarctic (Fig. 3d, e). 

In previous cold Arctic winters, chlorine was deactivated (converted 
from ozone-destroying forms into less reactive reservoir species) by 
mid-March"; even in 1997, ClO started to decline by late February 
(Fig. 2g). In 2011, by contrast, ClO began decreasing rapidly only about 
a week earlier than is typical in the Antarctic. ClO data in late February 
1997 indicate that not only were maximum values lower than those in 
early March 2011, but also the vertical range of enhancement was 
shallower, with weaker activation at low altitudes than in 2011 
(Fig. 3e), consistent with the higher altitudes and decreasing extent 
(Figs 1b, 3b, Supplementary Fig. 2) of T< Tact. 

When chlorine is deactivated, whether it is converted first into HCl 
or CIONO; depends sensitively upon HNO; and ozone abundances. In 
the Arctic, chlorine is normally deactivated through initial reformation 
of CIONO). In the severely denitrified and ozone-depleted Antarctic 
vortex, production of CIONO, is suppressed and that of HCl highly 
favoured’*!, In March 2011, the recovery of HCl followed a much 
more Antarctic-like pathway than has been observed in any other 
Arctic winter. 

The largest Arctic chemical ozone loss was previously observed in 
2005, followed closely by 2000 and 1996”'*""*. Although low tempera- 
tures persisted until the end of March 1997, the ozone loss in that year 
was far less. No previous year rivals 2011, when the evolution of Arctic 
ozone more closely followed that typical of the Antarctic (Fig. 2j). Ozone 
profiles in late March 2011 resemble typical Antarctic late-winter pro- 
files much more strongly than they do the average Arctic one (Fig. 3f). 
Because mixing in April 2011 (for example, lamination events larger 
than that shown in Fig. 3f) entrained ozone-rich air into the vortex, the 
slight decrease in vortex-averaged ozone at a potential temperature of 
485 K from 26 March to 20 April (from ~1.8 to ~1.6 p.p.mv., Fig. 2j) 
indicates continuing chemical loss during this interval. 


Estimates of chemical ozone loss 


Chemical loss is difficult to quantify in the Arctic, where transport 
from above replenishes ozone in the lower stratospheric vortex, 
obscuring the signature of chlorine-catalysed destruction’*”*”*. The 
evolution of the long-lived trace gas nitrous oxide (N20) reflects steady 
downward transport throughout the 2010-11 winter-spring, indi- 
cating that subsidence partially masked chemical loss. Horizontal 
transport can also confound the signature of chemical loss, bringing 
air into the vortex that has either higher** or lower’* concentrations of 
ozone, depending on the altitude and latitude from which it originates. 

Representative results from two types of chemical loss calcula- 
tions**** based on balloon-borne and satellite observations are shown 
in Fig. 4. The differences (up to ~0.4p.p.m.v. at the end of March 
2011) in estimates derived from the various methods and data sets 
imply some uncertainty in the chemical loss determination. Year-to- 
year differences in the amount of ozone loss are very similar when 
obtained from any method/data set combination, however, indicating 
a high degree of precision in the relative amount of calculated loss 
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Figure 2 | Chemical composition in the lower stratosphere. a-1, Maps (right) 
and vortex-averaged time series (left) at 485 K potential temperature (~20 km, 
~50 hPa) for four different gases: HNO; (a, b, c), HCl (d, e, f), ClO (g, h, i) and 
Os; (ozone; j, k, 1); mixing ratios from Aura MLS are shown. Averaging for the 
time series is done within the white contour shown on the maps. Blue (purple) 
triangles on time series, 1995-96 (1996-97) values from UARS MLS. Line 

colours/shading as in Fig. 1, but shading is for Aura MLS measurements from 


between different years. Chemical destruction was severe between 
~16 and 22 km altitude, with the largest loss exceeding 2.5 p.p.m.v. 
by 26 March 2011 (Fig. 4a). By 31 March 2011, chemical loss was 
nearly double that in 2005 from ~18 to above 22 km, and similar to 
that in 2005 at lower altitudes (Fig. 4b, c). From ~18 to 20 km, more 
than 80% of the ozone present in January had been chemically 
destroyed by late March. Chemical removal in 1996 and 2000 started 
at a rate similar to that in 2011 (Fig. 4c), but ceased by late March; 
maximum losses in 2000 approached those in 2011, but extended over 
a much smaller vertical range (Fig. 4b). Loss in 1996, 2000 and 2005 
considerably exceeded that in 1997, with greater destruction at lower 
altitudes in those years contributing more to total column loss”'*”’. 
Chemical loss in 2011 was two to three times larger than that in 1997, 
and about twice that in 1996 and 2005 above ~16 km; from ~15 to 
23 km it was comparable to that in the Antarctic ozone hole in 1985”. 


2005-10. Antarctic dates are shifted by six months (top axis on time series) to 
show the equivalent season. Vertical lines show dates of maps in 2011 (2010) in 
the Arctic (Antarctic). Black overlays on HNO; maps, T,< (~196 K at this 
level); HNO; may be sequestered in PSCs at lower temperatures. Dotted black/ 
white contour on ClO maps, 92° SZA, poleward of which measurements were 
taken in darkness. Yellow/black triangles on ozone maps, locations of the 
profiles in Fig. 3. 


Single ozone-sonde station measurements in early April 2011 suggest 
continuing ozone loss (Fig. 4c). 

Although the meteorology during March-April was similar in 1997 
and 2011, ozone loss was much more pronounced in 2011. Photo- 
chemical box model simulations (Supplementary Fig. 4, Supplemen- 
tary Discussion) elucidate how early winter conditions set the stage for 
record springtime ozone destruction in 2011. Chlorine activation 
brought on by enduring cold from December through to the end of 
February led to ~0.7-0.8 p.p.m.v. lower ozone at the beginning of 
March 2011 (Figs 2j, 4c). The early onset of continuous cold also 
facilitated formation of PSC particles large enough to sediment, result- 
ing in ~4 p.p.b.v. less HNO; by March in 2011 than in 1997 (Fig. 2a). 
The degree of denitrification has a profound impact on the severity of 
springtime Arctic ozone loss*’. By delaying chlorine deactivation, 
lower HNO; by 1 March was responsible for ~0.6 p.p.m.v. more ozone 
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Figure 3 | Vertical composition information. a, Red, PSCs/aerosol amounts 
averaged in the vortex over a week centred around 25 February 2011; dark blue, 
the average for the same week in 2007-10; grey, the average over the equivalent 
period (centred on 28 August) for the Antarctic in 2006-10; lavender, the Arctic 
average for a week centred around 26 March 2011. (In late winter-spring, 
maximum PSC altitudes are generally higher in the Arctic because early winter 
PSC activity redistributes HNO; and water vapour to lower altitudes in the 
Antarctic'*). b-f, Daily average profiles of MERRA temperatures (b) and MLS 
HNOs (c), HCl (d), ClO (e) and ozone (f). Red lines, data from a 4° X 15° 
latitude X longitude box around 79° N, 12° E; in ¢, f, taken on 26 March; in 


loss after that date in 2011 than in 1997 (Supplementary Fig. 4, 
Supplementary Discussion). The effects of denitrification and early- 
winter loss together account for the disparity in ozone depletion in 
these two winters (~1.5 p.p.m.v. more loss at 460 K in 2011 than in 
1997, Fig. 4c, Supplementary Fig. 4). Loss as severe as that in 2011 thus 
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b, d, e, on 6 March 2011. Lavender, 7-day average for 2005-10 (1980-2010 for 
b) centred on the same location and days. Grey, profiles in a similar box in the 
Antarctic (79° S, 12° E) on 26 September for c, f, and on 8 September 2010 for 
b, d, e. Dotted black line in b, approximate T,,; (195 K), see text. Purple line in 
b, 7-day average around 6 March 1997, centred on same location. Purple line in 
e, a midday ClO profile from UARS MLS on 26 February 1997 averaged in an 
8° X 30° box centred at the same Arctic location. A high-resolution ozone- 
sonde profile at Ny Alesund on 26 March 2011 (black in f) agrees well with 
MLS; lamination, a signature of mixing with ozone-rich extra-vortex air, is 
apparent as a local maximum near 60 hPa. 


requires T<T,.,, with consequent chlorine activation and ozone 
destruction, early in winter (as in 1996, 2000 and 2005, but not in 
1997), a cold period and region before March sufficient to allow wide- 
spread denitrification, and the persistence of a cold polar vortex into 
April (as in 1997, but not in 1996, 2000 or 2005). 
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Figure 4 | Chemical ozone loss estimates. a, Chemical loss as a function of 
time and potential temperature from passive subtraction of MLS and ATLAS 
passively-transported ozone (initialized with December MLS data). 

b, Chemical loss from ozone sondes in unmixed vortex air as a function of 
‘spring equivalent potential temperature™* (black contours in a). Shading, 
Antarctic range defined by 1985 (the first year with profile measurements 
inside the ozone hole”) and 2003 (a recent year with a severe ozone hole). The 
2003 Antarctic curve is shifted by six months minus 10 days because ozone 
sondes that year predominantly sampled the outermost vortex, where ozone 
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NH 2010-11 
NH 2004-05 
NH 1999-2000 
NH 1998-99 
NH 1996-97 
NH 1995-96 


SH 2003 


SH envelope 
1 Apr. 


loss begins earliest. c, Ozone at a spring equivalent potential temperature of 
465K (white contour in a), near the level of maximum chemical loss. Shading, 
the region below the minimum reached in the 1985 Antarctic ozone hole. In 
April 2011 most soundings sampled the disturbed vortex edge; only two were 
made in air uninfluenced by mixing (red dots). Error bars, 1o uncertainties 
based on the scatter of individual ozone-sonde measurements. Line colours as 
in Fig. 1; 1998-99 (a winter with no ozone loss) is shown in cyan. NH, Northern 
Hemisphere; SH, Southern Hemisphere. 
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Column ozone 


Total column ozone is a predominant factor determining exposure of 
Earth’s surface to ultraviolet radiation”’’. In the context of previous 
Arctic winters, 2011 was truly remarkable: the fraction of the Arctic 
vortex in March with total ozone less than 275 Dobson units (DU) is 
typically near zero, but reached nearly 45% in 2011 (Fig. 5a). Because 
of the dynamically-driven correlation between total ozone and lower- 
stratospheric temperature***'** (Supplementary Discussion), the 
abiding cold in 1997 and 2011 would have led to lower March total 
ozone than in other Arctic winters even without chemical loss; 
dynamical conditions in March-April 1997 particularly favoured 
low total ozone*®® (Supplementary Discussion). In March 2011, however, 
the area of low total ozone covered more than twice as much of the 
vortex as in 1997, and the daily vortex ‘ozone deficit’ (Supplementary 
Fig. 5a) was 30-50 DU larger, consistent with the greater chemical loss 
(Fig. 4). Maximum 2011 vortex fractions of low ozone approached 
those in early Antarctic ozone holes (Fig. 5a). The close correspond- 
ence between the vortex and both low total ozone and the large Arctic 
total ozone deficit (Fig. 5b, d) implies that low total ozone in March 
2011 resulted primarily from chemical loss*’? (Supplementary 
Discussion). The ozone deficit in the Antarctic (Fig. 5e) shows a 
maximum over 0-90° W, and a minimum over 90-200” E, reflecting 
a vortex position in 2010 different to that in the reference state (which 
is less robust than that for the Arctic). Differences in morphology deep 
in the vortex are, however, minimal. The 2011 Arctic ozone deficit was 
at least comparable to that in the 2010 Antarctic vortex core at an 
equivalent time. 
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Figure 5 | Total column ozone. a, Time series of the fraction of 460 K vortex 
area with total ozone below 275 Dobson units (DU) in February—April in the 
Arctic (bottom axis), and in August-October in the Antarctic (top axis). Line 
colours/shading as in Fig. 1. 2005-2011 values are from OMI; earlier values are 
from TOMS (Total Ozone Mapping Spectrometer) instruments’. Maps show 
OMI total ozone (b, c) and ozone deficit (d, e) in the Arctic (Antarctic) on 
26 March 2011 (26 September 2010). Overlays as in Fig. 2 but at 460 K. 
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An echo of the Antarctic 


In the absence of chemical ozone loss, downward transport during 
winter results in a springtime maximum in total ozone; because this 
transport is stronger in the Arctic, background ozone levels there are 
~100 DU higher than those in the Antarctic’”’. Therefore Arctic 
spring total ozone could, even after chemical destruction comparable 
to that in an Antarctic ozone hole (commonly defined by values less 
than 220 DU; refs 7, 12), exhibit only a weak maximum in total ozone 
rather than a well-defined minimum. Examination of the long-term 
ozone-sonde record in the Arctic shows that abundances near 250 DU 
or less are well below typical autumn values, thus appearing as a ‘hole’ 
in total ozone. Dynamical processes can result in transient regions of 
very low total ozone (Supplementary Discussion, Supplementary Figs 
5, 6) and/or local minima in lower-stratospheric ozone profiles (for 
example, via ozone-poor extra-vortex air transported into the polar 
vortex'***), For an interhemispheric comparison of chemical loss, it is 
thus important to verify that observed Arctic ozone decreases were 
primarily related to chemical, rather than dynamical, processes. 

Figure 4 shows that the precipitous decline in Arctic ozone in 
February-March 2011 resulted from chemical loss of similar mag- 
nitude to that in the Antarctic in the mid-1980s. Observed ozone 
between ~15 and 20km altitude decreased to values matching the 
minima in early Antarctic ozone holes and those reached at the cor- 
responding time in some recent Antarctic winters (Figs 2j-]; 3f). In 
late March-early April, most ozone-sonde profiles in the vortex had 
mixing ratios less than 1 p.p.m.v., with values ~0.7 p.p.m.v. over an 
approximately 2-km altitude region, and some dipping to 0.5 p.p.m.v. 
(Supplementary Fig. 7). Minimum total ozone in spring 2011 was 
continuously below 250 DU for ~27 days (Supplementary Fig. 5b), 
with a maximal area below that level of ~2 x 10°km/ (roughly five 
times the area of Germany or California). Values dropped to ~220- 
230 DU for about a week in late March 2011. 

In these respects, chemical ozone destruction in the 2011 Arctic 
polar vortex attained, for the first time, a level clearly identifiable as an 
Arctic ozone hole. On the other hand, although the magnitude of 
chemical depletion was comparable to that in the Antarctic, total 
ozone values remained higher and, because the areal extent of the 
Arctic vortex was much smaller (~60% the size of a typical 
Antarctic vortex), the low-ozone region was more confined. 

The Arctic winter stratosphere exhibits striking interannual vari- 
ability. The past decade has included the four most dynamically active 
(hence among the warmest) Arctic winters in the past 32 years (ref. 
35) and now the two coldest winters with largest ozone loss”’*-"*, 
extending the previously noted trend of the coldest winters becoming 
colder’*'*. Had implementation of the Montreal Protocol not curbed 
the increase in stratospheric halogen loading, formation of an Arctic 
ozone hole would have already become common even in moderately 
cold winters*®. Even with the lower anthropogenic halogen levels 
actually reached, the potential for Antarctic-like ozone loss in the 
Arctic in the event of a persistently cold winter-spring such as that 
in 2010-11 has been recognized for decades**’. Despite temperatures 
that were generally far higher than those in Antarctic winter, Arctic 
chemical ozone destruction in 2011 rivalled that in some Antarctic 
ozone holes. The development of an Arctic ozone hole under condi- 
tions only slightly more extreme than those in some previous Arctic 
winters raises the possibility of yet more severe depletion as lower- 
stratospheric temperatures decrease. More acute Arctic ozone 
destruction could exacerbate biological risks from increased ultra- 
violet radiation exposure, especially if the vortex shifted over densely 
populated mid-latitudes, as it did in April 2011. 

Our present understanding of what drives variability in the Arctic 
winter stratosphere is incomplete. Stratospheric temperatures and 
vortex evolution depend on the atmosphere’s radiative properties 
and propagation of wave activity’’**, which are being modified by 
increasing greenhouse gas concentrations. Day-to-day tropospheric 
disturbances can lead to stratospheric warming or cooling, depending 
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on their geographical location and the stratospheric vortex structure, 
which controls their upward propagation*”””°. Current climate models 
do not fully capture either the observed short-timescale patterns of 
Arctic variability or the full extent of the observed longer-term cooling 
trend in cold stratospheric winters; nor do they agree on future cir- 
culation changes that affect trends in transport*’*. Our ability to 
predict when conditions similar to, or more extreme than, those in 
2011 may be realized is thus very limited. Improving our predictive 
capabilities for Arctic ozone loss, especially while anthropogenic 
halogen levels remain high, is one of the greatest challenges in polar 
ozone research. Comprehensive stratospheric data sets, such as those 
used here, are critical to meeting that challenge. 


METHODS SUMMARY 


MERRA (Modern Era Retrospective-analysis for Research and Applications") 
fields are used for temperature and vortex analysis and for vortex averaging of 
composition measurements. The CALIOP (Cloud-Aerosol Lidar with 
Orthogonal Polarization) on the CALIPSO (Cloud-Aerosol Lidar and Infrared 
Pathfinder Satellite Observations) satellite** provides PSC/aerosol information. 

Trace gas profiles are from the Microwave Limb Sounder (MLS)* on NASA’s 
Aura satellite. Only daytime ClO measurements are used. Northern (southern) 
high latitudes are sampled near midday (in late afternoon), thus the average solar 
zenith angle (SZA) of MLS Antarctic measurements is ~7° higher than that in the 
Arctic. Reactive chlorine partitioning shifts away from ClO at higher SZAs”!’, 
leading to ~30% lower ClO measured in the Antarctic than in the Arctic under 
fully activated conditions. An instrument anomaly disrupted MLS measurements 
from 27 March to 20 April 2011. UARS (Upper Atmosphere Research Satellite) 
MLS measurements, used for 1995-1996 and 1996-1997 analyses, are sparse 
because of the UARS yaw cycle and other measurement gaps”*. 

Total column ozone is measured by the Dutch-Finnish Ozone Monitoring 
Instrument (OMI)** on Aura. Total ozone ‘deficit’ is the difference between daily 
values and a reference that is minimally affected by chemical loss. 

Measurements from MLS and the Match network of balloon-borne ozone 
soundings (ozone sondes)” are used to estimate chemical ozone loss in two ways. 
The difference between calculated ‘passive’ (influenced only by transport) ozone 
and observed ozone is computed, with passive ozone obtained using MLS nitrous 
oxide", a ‘reverse trajectory’ model’*”*, and the ATLAS (Alfred Wegener Institute 
Lagrangian Chemistry/Transport System) model”. Vortex ozone is also examined 
on the surfaces on which it subsides!*!*”*"*, with descent rates from modelled 
radiative heating/cooling rates averaged over the polar vortex”*. 

Photochemical box model runs were performed using the chemical model 
from ATLAS” to test the sensitivity of ozone loss to initial ozone amounts and 
denitrification. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Data sets. Modern Era Retrospective-analysis for Research and Applications 
(MERRA)** fields, from the Goddard Earth Observing System Version 5.2.0 
(GEOS-5) data assimilation system, are used for the temperature and vortex 
analysis. The Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) on 
the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations 
(CALIPSO) satellite’ provides PSC/aerosol information. CALIOP measure- 
ments began in April 2006. Trace gas profile measurements are from the 
Microwave Limb Sounder (MLS)* on NASA’s Aura satellite, and the predecessor 
MLS instrument” on the Upper Atmosphere Research Satellite (UARS). Total 
column ozone data are from the Dutch-Finnish Ozone Monitoring Instrument 
(OMI)* on board Aura. The historical total ozone record comprises data from 
Nimbus-7 and Earth Probe Total Ozone Mapping Spectrometer (TOMS)°*°. Aura 
MLS and OMI measurements are available from August 2004 through to the 
present. UARS MLS measurements were obtained from September 1992 through 
to early 2000, with increasingly sparse sampling in the later years**. TOMS data 
are available beginning in 1979, but no TOMS instrument was taking measure- 
ments during the 1995-96 Arctic winter. 

Measurements from the Match network of balloon-borne ozone soundings 
(ozone sondes)*’ are used in some of the chemical ozone loss estimates. 
Temperature and vortex analysis. Potential vorticity (PV) is used to define 
the vortex, with a contour of ‘scaled’ PV of 1.4 10's”! (in vorticity units) 
demarking the vortex edge*’*’. Vortex strength is diagnosed as the maximum 
daily gradient in PV as a function of equivalent latitude (the latitude that would 
enclose the same area between it and the pole as a given PV contour)’'~’. Scaled 
PV multiplied by 10° is used in the calculation, resulting in units for its gradient of 
10° *(s degrees equivalent latitude) '. 

The temperature threshold for chlorine activation, T,1, is estimated using the 
formula for nitric acid trihydrate formation”, which depends on pressure, HNO 
and H,0. Climatological HNO; and H,0 profiles are used, derived from UARS 
data. The area with T < T,,, is calculated on seven isentropic surfaces in the lower 
stratosphere: 390, 410, 430, 460, 490, 520 and 550 K; T,< on these levels is 197.5, 
197.2, 196.8, 196.5, 195.9, 195.3 and 194.5 K, respectively. To get the volume with 
T < Tact from 380 through 565 K, the areas at each of the seven levels are mul- 
tiplied by the estimated altitude associated with that layer and summed. The 
altitude range associated with each layer is obtained from a standard potential 
temperature profile as a function of altitude derived from high latitude temper- 
ature soundings taken during the 1988-89 through to 2001-02 winters (the same 
profile was used for V,,,- calculations in refs 13, 16 and 48). These thicknesses are 
1.29088, 1.19995, 1.36770, 1.46281, 1.30554, 1.18199 and 1.07382km for the 
seven levels listed above. Vortex volume is calculated from vortex area in the 
same manner. Winter mean V).c is calculated over 16 December through to 15 
April. Previous studies have shown that V,,,. scaled by the vortex area is a good 
proxy for chlorine activation and ozone loss potential'’. Additional temperature 
and vortex diagnostics are described in Supplementary Information. 

Polar stratospheric cloud and aerosol information. Particulate backscatter 
averaged over the polar vortex derived from CALIOP data is used to provide 
PSC/aerosol information. Total attenuated backscatter at 532 nm, b(z), is one of 
the basic CALIOP Level 1B data products. b(z) is the sum of the particulate 
backscatter (due to liquid aerosol and PSCs), b,(z), and molecular backscatter, 
byn(Z). Dm (Z) is calculated using GEOS-5 molecular density profiles (included in 
the CALIOP Level 1B data files) and a theoretical value for the molecular scatter- 
ing cross-section”’. Profiles of b,(z) are then produced by subtracting b,,(z) from 
b(z). Vortex-averaged profiles of b,(z) are produced by averaging all CALIOP 
b,(z) profiles located inside the vortex edge (defined using information available 
in GEOS-5 Derived Meteorological Product (DMP) files for the nearly-coincident 
Aura MLS data*’) over the selected time interval. 

MLS trace gas profile measurements and analysis. Trace gas profile measure- 
ments of HNO3, HCl, ClO, ozone and N2O (a long-lived tracer used to assess 
descent) are from Aura MLS* version 3 retrievals; data quality screening is as 
recommended in the MLS data quality document*®. MLS data are retrieved on 
pressure surfaces; potential temperature as a function of pressure from MLS 
DMPs” calculated from GEOS-5 analyses is used to interpolate to isentropic 
surfaces. Vortex averages of MLS data are calculated using the 1.410 *s * 
scaled PV contour to define the vortex edge, using PV values from the MLS 
DMPs”. Active chlorine is in the form of ClO mainly during the daytime, and 
thus measured ClO amounts vary with the solar zenith angle (SZA) at which the 
measurements are taken. Only daytime ClO measurements are used here. 
Northern high latitudes are sampled near midday local time, southern high 
latitudes are sampled in late afternoon, thus the SZA of Aura MLS Antarctic 
measurements is ~7° higher on average than that in the Arctic. Reactive chlorine 
partitioning shifts away from ClO at higher SZAs””, leading to ~30% lower CIO 
measured by Aura MLS in the Antarctic than in the Arctic under fully activated 


conditions. MLS measurements are unavailable from 27 March through to 
20 April 2011 because of an instrument anomaly. Upper Atmosphere Research 
Satellite (UARS) MLS measurements, used for analysis of 1995-96 and 1996-97, 
are sparse because of the UARS yaw cycle and other measurement gaps”. The 
time of day of UARS measurements varied through the yaw cycle, in the middle of 
which no daytime ClO measurements were obtained"; thus ClO values shown in 
1995-96 and 1996-97 near those dates (including the mid-February 1996 mea- 
surements shown in Fig. 2g) are not representative of the degree of chlorine 
activation. 

Chemical loss calculations. Chemical ozone loss is quantified by two methods, 
both widely used for such calculations”'?***”“8, In the ‘passive subtraction’ 
method’’’, a transport model is used to calculate the evolution of ozone in 
the absence of chemical changes (‘passive’ ozone). The difference between passive 
ozone and observed ozone provides an estimate of chemical loss. 

Here, passive ozone is obtained in three different ways. First, MLS observations 
of N,O, a long-lived species unaffected by chemical processes, are used to cal- 
culate vertical motion, and that estimate of descent is then used to calculate how 
initial MLS ozone profiles would have evolved in the absence of chemical loss'’. 
Second, a ‘reverse trajectory’ transport model”*”® is used to transport an initial 
state based on MLS-observed ozone with no chemistry. Finally, the ATLAS 
(Alfred Wegener Institute Lagrangian Chemistry/Transport System) chemistry 
and transport model is run in passive mode”, initialized with MLS ozone. 

Vortex ozone is also examined in relation to the surfaces on which it is sub- 
siding’*"*”**, The descent rates used here are obtained by averaging radiative 
heating/cooling rates from the radiation calculation used in the ATLAS model 
over the polar vortex“*. These rates are then used to examine vortex-averaged 
MLS and ozone-sonde data on surfaces of ‘spring equivalent potential temper- 
ature™*, defined as the potential temperature at which air originating at a given 
level arrived at the end of March. Since the air descended on these surfaces, ozone 
would have been constant on each such surface in the absence of chemical loss. 

The ozone-sonde data used here are all from electrochemical concentration cell 
(ECC) sondes, made by different manufacturers. Ozone-sonde data quality was 
assessed in an intercomparison experiment” and is discussed in ref. 47. For 
chemical loss calculations using ozone-sonde data, the profiles are first examined 
using a procedure for detecting lamination in the profiles; such lamination (an 
example is shown in Fig. 3f) is associated with mixing in of extra-vortex air, which 
may obscure the signature of chemical loss. Profiles that have been significantly 
altered by mixing processes, as indicated by lamination, are excluded from the 
vortex averages used in the chemical loss calculations. 2010-11 Arctic ozone- 
sonde data are provided as Supplementary Information. 

Results from the ATLAS model passive subtraction calculations, and from the 
calculations on spring equivalent potential temperature surfaces using the Match 
network ozone-sonde data, are shown in Fig. 4; all panels show vortex averages. 
These results have been compared with the results from the other methods 
described above. While absolute ozone values obtained from different methods/ 
data sets vary significantly (up to ~0.4 p.p.m.v. at the end of March 2011), the 
year-to-year variations in chemical loss calculated using all three methods agree 
closely, indicating a high degree of precision in the relative amount of calculated 
loss between different years. 

The Alfred Wegener Institute chemical box model, also used as the chemical 
module in ATLAS, simulates 175 reactions between 48 chemical species in the 
stratosphere*”** This model was used to perform conceptual runs (Supplemen- 
tary Fig. 4), started on 1 March with identical initial mixing ratios of all species 
except HNO; and O3. For these two species values corresponding to 1997 
(3 p.p.m.v. Os, 10 p.p.b.v. HNO) and 2011 (2.2 p.p.m.v. O3, 6 p.p.b.v. HNO;) 
(compare Figs 2a and 4c) were combined to yield four sets of initial conditions. 
Initial ClO, was 2 p.p.b.v., corresponding to the vortex-averaged ClO, derived by 
ATLAS from MLS ClO measurements on 1 March 2011. An air parcel at 70° N, 
460 K potential temperature, with a temperature of 193 K throughout March, was 
used. Heterogeneous reactions took place on liquid aerosols, rather than solid 
(nitric acid trihydrate, NAT) PSCs, since the widespread existence of the latter is 
inconsistent with MLS observations of gas-phase HNO; values (Fig. 2a) larger 
than those the microphysical module predicts if NAT is present. A sensitivity 
run showed that sporadically occurring solid PSCs did not change the results 
significantly. 

Column ozone and ozone deficit calculation. OMI total ozone data were 
processed with version 8.5 of the TOMS algorithm and have been extensively 
validated. TOMS data were processed with version 8 of the algorithm. The OMI 
and TOMS total ozone data used in this study were averaged on a fixed global 
1° X 1° latitude X longitude grid. Averages were computed by area-weighting 
observations based on the overlap of their instantaneous field-of-view with each 
grid cell. Only data that satisfy quality criteria based on measurement path length 
and algorithm diagnostic criteria were included in the averaged samples. 
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Individual total ozone retrievals included in the samples are expected to have a 
root-mean-squared error of 1-2%. 

Total ozone ‘deficit’ is calculated as the difference between daily values and a 
reference that is minimally affected by chemical ozone loss. The reference for the 
Arctic is the daily mean over all Arctic winters from 1978-79 through to 2009-10, 
from OMI starting in 2004-05 and from TOMS for earlier years”. The Antarctic 
reference state is the daily mean of TOMS measurements for 1979 through to 
1981. Because the Antarctic reference state is based on only three years’ data for 
each day, variations in vortex position are not effectively averaged out; this 
reference is thus less robust than that for the Arctic, so patterns in daily maps 
may partially reflect differences in vortex position between the reference and the 
focus day. 
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The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the 
sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5°% of the human genome has 
undergone purifying selection, and locate constrained elements covering ~4.2°% of the genome. We use evolutionary 
signatures and comparisons with experimental data sets to suggest candidate functions for ~60% of constrained bases. 
These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 
regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural 
families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report 
specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from 
mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated 


variants indicates that our findings will be relevant for studies of human biology, health and disease. 


A key goal in understanding the human genome is to discover and 
interpret all functional elements encoded within its sequence. Although 
only ~1.5% of the human genome encodes protein sequence’, com- 
parative analysis with the mouse’, rat* and dog* genomes showed that 
at least 5% is under purifying selection and thus probably functional, of 
which ~3.5% consists of non-coding elements with probable regula- 
tory roles. Detecting and interpreting these elements is particularly 
relevant to medicine, as loci identified in genome-wide association 
studies (GWAS) frequently lie in non-coding sequence’. 

Although initial comparative mammalian studies could estimate 
the overall proportion of the genome under evolutionary constraint, 
they had little power to detect most of the constrained elements— 
especially the smaller ones. Thus, they focused only on the top 5% of 
constrained sequence, corresponding to less than ~0.2% of the 
genome*®. In 2005, we began an effort to generate sequence from a 
large collection of mammalian genomes with the specific goal of iden- 
tifying and interpreting functional elements in the human genome on 


the basis of their evolutionary signatures”*. Here we report our results 
to systematically characterize mammalian constraint using 29 eutherian 
(placental) genomes. We identify 4.2% of the human genome as con- 
strained and ascribe potential function to ~60% of these bases using 
diverse lines of evidence for protein-coding, RNA, regulatory and chro- 
matin roles, and we present evidence of exaptation and accelerated 
evolution. All data sets described here are publicly available in a com- 
prehensive data set at the Broad Institute and University of California, 
Santa Cruz (UCSC). 


Sequencing, assembly and alignment 

We generated genome sequence assemblies for 29 mammalian species 
selected to achieve maximum divergence across the four major mam- 
malian clades (Fig. la and Supplementary Text 1 and Supplementary 
Table 1). For nine species, we used genome assemblies based on ~7- 
fold coverage shotgun sequence, and for 20 species we generated ~2- 
fold coverage (2), to maximize the number of species sequenced 
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Figure 1 | Phylogeny and constrained elements from the 29 eutherian 
mammalian genome sequences. a, A phylogenetic tree of all 29 mammals 
used in this analysis based on the substitution rates in the MultiZ alignments. 
Organisms with finished genome sequences are indicated in blue, high quality 
drafts in green and 2X assemblies in black. Substitutions per 100 bp are given 
for each branch; branches with =10 substitutions are coloured red, blue 
indicates <10 substitutions. b, At 10% FDR, 3.6 million constrained elements 
can be detected encompassing 4.2% of the genome, including a substantial 
fraction of newly detected bases (blue) compared to the union of the HMRD 50- 
bp + Siepel vertebrate elements’’ (see Supplementary Fig. 4b for comparison to 
HMRD elements only). The largest fraction of constraint can be seen in coding 
exons, introns and intergenic regions. For unique counts, the analysis was 
performed hierarchically: coding exons, 5’ UTRs, 3’ UTRs, promoters, 
pseudogenes, non-coding RNAs, introns, intergenic. The constrained bases are 
particularly enriched in coding transcripts and their promoters 
(Supplementary Fig. 4c). 


with available resources on capillary machines. Twenty genomes are 
first reported here, and nine were previously described (see Sup- 
plementary Information). 

The power to detect constrained elements depends largely on the total 
branch length of the phylogenetic tree connecting the species’. The 
29 mammals correspond to a total effective branch length of ~4.5 
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substitutions per site, compared to ~0.68 for the human-mouse- 
rat-dog comparison (HMRD), and thus should offer greater power 
to detect evolutionary constraint: the probability that a genomic 
sequence not under purifying selection will remain fixed across 
all 29 species is P;<0.02 for single bases and P,.< 10 7° for 
12-nucleotide sequences, compared to P, ~ 0.50 and P,,~ 10° for 
HMRD. 

For mammals for which we generated 2 coverage, our assisted 
assembly approach” resulted in a typical contig size N50¢ of 2.8 kb 
and a typical scaffold size N50; of 51.8kb (Supplementary Text 2 
and Supplementary Table 1) and high sequence accuracy (96% of 
bases had quality score Q20, corresponding to a <1% error rate)". 
Compared to high-quality sequence across the 30 Mb of the ENCODE 
pilot project’’, we estimated average error rates of 1-3 miscalled bases 
per kilobase”, which is ~50-fold lower than the typical nucleotide 
sequence difference between the species, enabling high-confidence 
detection of evolutionary constraint (Supplementary Text 3). 

We based our analysis on whole-genome alignments by MultiZ 
(Supplementary Text 4). The average number of aligned species was 
20.9 at protein-coding positions in the human genome and 23.9 at the 
top 5% HMRD-conserved non-coding positions, with an average 
branch length of 4.3 substitutions per base in these regions (Sup- 
plementary Figs 1 and 2). In contrast, whole-genome average align- 
ment depth is only 17.1 species with 2.9 substitutions per site, 
probably due to large deletions in non-functional regions*. The depth 
at ancestral repeats is 11.4 (Supplementary Fig. 1a), consistent with 
repeats being largely non-functional**. 


Detection of constrained sequence 


Our analysis did not substantially change the estimate of the propor- 
tion of genome under selection. By comparing genome-wide conser- 
vation to that of ancestral repeats, we estimated the overall fraction of 
the genome under evolutionary constraint to be 5.36% at 50-bp windows 
(5.44% at 12-bp windows), using the SiPhy- statistic’’, a measure of 
overall substitution rate (Supplementary Fig. 3), consistent with previous 
similar estimates**'*. However, alternative methods'*’® and different 
ways of correcting for the varying alignment depths give higher esti- 
mates (see Supplementary Text 5 for details). 

The additional species had a marked effect on our ability to identify 
the specific elements under constraint. With 29 mammals, we pinpoint 
3.6 million elements spanning 4.2% of the genome, at a finer resolution 
of 12 bp (Fig. 1b and Supplementary Text 6, Supplementary Fig. 4, 
Supplementary Tables 2 and 3), compared to <0.1% of the genome 
for HMRD 12-bp elements and 2.0% for HMRD 50-bp elements’. 
Elements previously detected using five vertebrates’? also detect a 
larger fraction of the genome (~4.1%), but only cover 45% of the 
mammalian elements detected here, suggesting that a large fraction 
of our elements are mammalian specific. The mean element size 
(36bp) is considerably shorter than both previously detected 
HMRD elements (123 bp) and five-vertebrate elements (104 bp)’”. 
For example, it is now possible to detect individual binding sites for 
the neuron-restrictive silencer factor (NRSF) in the promoter of the 
NPAS4 gene, which are beyond detection power in previous data sets 
(Fig. 2 and Supplementary Fig. 5). We found a similar regional distri- 
bution of 12-bp elements (including the 2.6 million newly detected 
constrained elements) to previously detected HMRD elements 
(r = 0.94, Supplementary Fig. 6). Similar results were obtained with 
the PhastCons” statistic (see Supplementary Text 6). 

Using a new method, SiPhy-n, sensitive not just to the substitution 
rate but also to biases in the substitution pattern (for example, posi- 
tions free to mutate between G and T only, Supplementary Fig. 7), we 
detected an additional 1.3% of the human genome in constrained 
elements (see Supplementary Tables 2 and 3). Most of the newly 
detected constrained nucleotides extend elements found by rate- 
based methods, but 22% of nucleotides lie in new elements (average 
length 17 bp) and are enriched in non-coding regions. 
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Figure 2 | Identification of four NRSF-binding sites in NPAS4. a, The 
neurological gene NPAS4 has many constrained elements overlapping introns 
and the upstream intergenic region. The grey shaded box contained only one 
constrained element using HMRD, whereas analysis of 29 mammalian 
sequences reveals four smaller elements. b, These four constrained elements in 
the first intron correspond to binding sites for the NRSF transcription factor, 
known to regulate neuronal lineages. 


Constraint within the human population 

We observed that the evolutionary constraint acting on the 29 mammals 
is correlated with constraint within the human population, as assessed 
from human polymorphism data (Supplementary Text 7) and consist- 
ent with previous studies'*. Mammalian constrained elements show a 
depletion in single-nucleotide polymorphisms (SNPs)’”, and more 
constrained elements show even greater depletion. For example, in 
the top 1% most strongly conserved non-coding regions, SNPs occur 
at a 1.9-fold lower rate than the genome average, and the derived alleles 
have a lower frequency, consistent with purifying selection at many of 
these sites in the human population. 

Moreover, at positions with biased substitution patterns across 
mammals, the observed human SNPs show a similar bias to the one 
observed across mammals (Supplementary Fig. 7). Thus, not only are 
constrained regions less likely to exhibit polymorphism in humans, 
but when such polymorphisms are observed, the derived alleles in 
humans tend to match the alleles present in non-human mammals, 
indicating a preference for the same alleles across both mammalian 
and human evolution. 


Functional annotation of constraint 


We first studied the overlap of the 3.6 million evolutionarily con- 
strained elements (w < 0.8 and P< 10 '°) with known gene annota- 
tions (Fig. 1b). Roughly 30% of constrained elements were associated 
with protein-coding transcripts: ~25.3% overlap mature messenger 
RNAs (including 19.6% in coding exons, 1.2% in 5’ untranslated 
regions (5’ UTRs) and 4.4% in 3’ UTRs), and an additional 4.4% 
reside within 2 kb of transcriptional start sites (1.2% reside within 
200 bases). 

The majority of constrained elements, however, reside in intronic 
and intergenic regions (29.7% and 38.6%, respectively). To study their 
biological roles and provide potential starting points to understand 
these large and mostly uncharted territories, we next studied their 
overlap with evolutionary signatures’*”°*" characteristic of specific 
types of features and a growing collection of public large-scale experi- 
mental data. 


Protein-coding genes and exons 

Despite intense efforts to annotate protein-coding genes over the past 
decade” **, we detected 3,788 candidate new exons (a 2% increase) 
using evolutionary signatures characteristic of protein-coding exons”. 
Of these, 54% reside outside transcripts of protein-coding genes, 19% 
within introns, and 13% in UTRs of known coding genes (Supplemen- 
tary Text 8, Supplementary Tables 4 and 5). Our methods recovered 
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92% of known coding exons that were larger than 10 codons and fall in 
syntenic regions, the remainder showing non-consensus splice sites, 
unusual features, or poor conservation. 

The majority of new exon candidates (>58%) are supported by 
evidence of transcription measured in 16 human tissues*® (Sup- 
plementary Fig. 8a) or similarity to known Pfam protein domains. 
Thirty-one per cent of intronic and 13% of intergenic predictions 
extend known transcripts, and 5% and 11% respectively reside in 
new transcript models. The newly detected exons are more tissue 
specific than known exons (mean of 3 tissues versus 12) and are 
expressed at fivefold lower levels. Directed experiments and manual 
curation will be required to complete the annotation of the few hundred 
protein-coding genes that probably remain unannotated”’. 

We found apparent stop codon readthrough” of four genes based on 
continued protein-coding constraint after an initial conserved stop 
codon” and until a subsequent stop codon (Supplementary Text 9 
and Supplementary Fig. 8b). Readthrough in SACMIL could be triggered 
by an 80-base conserved RNA stem loop predicted by RNAz”, lying 
four bases downstream of the readthrough stop codon. 

We also detected coding regions with a very low synonymous sub- 
stitution rate, indicating additional sequence constraints beyond the 
amino acid level (Supplementary Text 9). We found >10,000 such 
synonymous constraint elements (SCEs) in more than one-quarter of 
all human genes”’. Initial analysis indicates potential roles in splicing 
regulation (34% span an exon-exon junction), A-to-I editing, 
microRNA (miRNA) targeting and developmental regulation. HOX 
genes contain several top candidates (Fig. 3a), including two previ- 
ously validated developmental enhancers”. 


RNA structures and families of structural elements 

We next used evolutionary signatures characteristic of conserved 
RNA secondary structures* to reveal 37,381 candidate structural ele- 
ments (Supplementary Text 10 and Supplementary Fig. 9a), covering 
~1% of constrained regions. For example, the XIST large intergenic 
non-coding RNA (lincRNA), known to bind chromatin and enable 
X-chromosome inactivation*’, contains a newly predicted structure in 
its 3’ end (Supplementary Fig. 9b, f)—distinct from other known 
structures**—which seems to be the source of chromatin-associated 
short RNAs”. 

Sequence- and structure-based clustering of predictions outside 
protein-coding exons revealed 1,192 novel families of structural 
RNAs (Supplementary Text 10). We focused on a high-scoring subset 
consisting of 220 families with 725 instances, which also showed the 
highest thermodynamic stability’? (Supplementary Figs 9a and 10), 
DNase hypersensitivity, expression pattern correlation across tissues 
and intergenic expression enrichment (Supplementary Fig. 9a). We 
also expanded both known and novel families by including additional 
members detected by homology to existing members. 

Noteworthy examples include: a glycyl-tRNA family, including a 
new member in POP1, involved in tRNA maturation and probably 
involved in feedback regulation of POP1; three intronic families of 
long hairpins in ion-channel genes known to undergo A-to-I RNA 
editing and possibly involved in regulation of the editing event; an 
additional member of a family of 5’ UTR hairpins overlapping the 
start codon of collagen genes and potential new miRNA genes that 
extend existing families”. 

Two of the largest novel families consist of short AU-rich hairpins of 
6-7 bp that share the same strong consensus motif in their stem. These 
occur in the 3’ UTRs of genes in several inflammatory response path- 
ways, the post-transcriptional regulation of which often involves AU- 
rich elements (AREs). Indeed, two homologous hairpins in TNF and 
CSF3 correspond to known mRNA-destabilization elements, suggest- 
ing roles in mRNA stability for the two families’. 

Lastly, a family of six conserved hairpin structures (Supplementary 
Fig. 9g) was found in the 3’ UTR of the MAT2A gene’, which is 
involved in the synthesis of S-adenosylmethionine (SAM), the primary 
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Figure 3 | Examination of evolutionary signatures identifies SCEs and 
evidence of positive selection. a, Two regions within the HOXA2 open reading 
frame are identified as SCEs (red), corresponding to overlapping functional 
elements within coding regions. Note that the synonymous rate reductions are 
not obvious from the base-wise conservation measure (in blue). Both elements 
have been characterized as enhancers driving HOXA2 expression in distinct 
segments of the developing mouse hindbrain. The element in the first exon 
encodes Hox-Pbx-binding sites and drives expression in rhombomere 4 (ref. 
33), whereas the element in the second exon contains Sox-binding sites and 
drives expression in rhombomere 2 (ref. 32). Synonymous constraint elements 
are also found in most other HOX genes, and up to a quarter of all genes. 
b, Although ~85% of genes show only negative (purifying) selection and 9% of 
genes show uniform positive selection, the remaining 6% of genes, including 
ABI2, show only localized regions of positively selected sites. Each vertical bar 
covers the estimated 95% confidence interval for dN/dS at that site (with values 
of 0 truncated to 0.01 to accommodate the log scaling), and bars are coloured 
according to a signed version of the SLR statistic for non-neutral evolution: blue 
for sites under purifying selection, grey for neutral sites and red for sites under 
positive selection. 


methyl donor in human cells. All six hairpins consist of a 12-18-bp 
stem and a 14-bp loop region with a deeply conserved sequence motif 
(Supplementary Fig. 9e), and may be involved in sensing SAM con- 
centrations, which are known to affect MAT2A mRNA stability**. 


Conservation patterns in promoters 

As different types of conservation in promoters may imply distinct 
biological functions”, we classified the patterns of conservation 
within core promoters into three categories: (1) those with uniformly 
‘high’ constraint (7,635 genes, 13,996 transcripts); (2) uniformly ‘low 
constraint (2,879 genes, 4,135 transcripts); and (3) ‘intermittent’ con- 
straint, consisting of alternating peaks and troughs of conservation 
(14,271 genes and 29,814 transcripts) (Supplementary Fig. 11a). High 
and intermittent constraint promoters are both associated with CpG 
islands (~66%), whereas low constraint promoters have significantly 
lower overlap (~41%), and all three classes show similar overlap with 
functional TATA boxes (2-3%, see Supplementary Text 11). 

These groups show distinct Gene Ontology enrichments (Sup- 
plementary Fig. 11b), with high-constraint promoters involved in 
development (P with Bonferroni correction (Pgonp) < 107 °°), inter- 
mittent constraint in basic cellular functions (Pgonr<5 X 107“), and 
low-constraint promoters in immunity, reproduction and perception, 
functions expected to be under positive selection and lineage-specific 
adaptation’. 
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High constraint may reflect cooperative binding of many densely 
binding factors, as previously suggested for developmental genes’. 
Intermittent constraint promoters, the peak-spacing distribution of 
which was suggestive of the periodicity of the DNA helix turns, may 
reflect loosely interacting factors (Supplementary Fig. 11c, d). Low 
constraint may reflect rapid motif turnover, under neutral drift or 
positive selection. 


Identifying specific instances of regulatory motifs 

Data from just four species (HMRD) was sufficient to create a cata- 
logue of known and novel motifs with many conserved instances 
across the genome”. The power to discover such motifs was high, 
because one can aggregate data across hundreds of motif instances. 
Not surprisingly, the additional genomes therefore had little effect on 
the ability to discover new motifs (known motifs showed 99% cor- 
relation in genome-wide motif conservation scores, Supplementary 
Figs 12 and 13). 

In contrast, the 29 mammalian genomes markedly improved our 
ability to detect individual motif instances, making it possible to pre- 
dict specific target sites for 688 regulatory motifs corresponding to 
345 transcription factors (Supplementary Fig. 14). We chose to 
identify motif instances at a false discovery rate (FDR) of 60%, repre- 
senting a reasonable compromise between specificity and sensitivity 
given the available discovery power (Supplementary Text 12), and 
matching the experimental specificity of chromatin immunoprecipita- 
tion (ChIP) experiments for identifying biologically significant targets”. 
Higher levels of stringency could be obtained by sequencing additional 
species. 

We identified 2.7 million conserved instances (Supplementary 
Table 6), enabling the construction of a regulatory network linking 
375 motifs to predicted targets, with a median of 21 predicted regu- 
lators per target gene (25th percentile, 10; 75th percentile, 39). The 
number of target sites (average, 4,277; 25th percentile, 1,407; 75th 
percentile, 10,782) are comparable to those found in ChIP experi- 
ments, and have the advantage that they are detected at nucleotide 
resolution, enabling us to use them to interpret disease-associated 
variants for potential regulatory functions. However, some motifs 
never reached high confidence values, and others did so at very few 
instances. 

The motif-based targets show strong agreement with experiment- 
ally defined binding sites from ChIP experiments (Supplementary 
Table 7). For long and distinct motifs, such as CTCF and NRSF, the 
fraction of instances overlapping experimentally observed binding 
matches the fraction predicted by the confidence score (for example, 
at 80% confidence 70% of NRSF motif instances overlapped bound 
sites, and at ~50% confidence 40% overlapped), despite potential 
confounding aspects such as condition-specific binding, overlapping 
motifs between factors, or non-specific binding. Moreover, increasing 
confidence levels showed increasing overlap with experimental binding 
(Supplementary Figs 14-16). For example, YY1 enrichment for bound 
sites increased from 42-fold to 168-fold by focusing on conserved 
instances. Lastly, combining motif conservation and experimental 
binding led to increased enrichment for candidate tissue-specific 
enhancers, suggesting that the two provide complementary informa- 
tion. Within bound regions, the evolutionary signal reveals specific 
motif instances with high precision (for example, Figs 2 and 4 and 
Supplementary Fig. 17). 


Chromatin signatures 

To suggest potential functions for the ~68% of ‘unexplained’ con- 
strained elements outside coding regions, UTRs or proximal promoters, 
we used chromatin state maps from CD4 T cells* (Supplementary 
Fig. 18) and nine diverse cell types** (Supplementary Text 13 and Sup- 
plementary Fig. 19). In T cells, constrained elements were most 
enriched for promoter-associated states (up to fivefold), an insulator 
state and a specific repressed state (2.2-fold), and numerous enhancer 
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Figure 4 | Using constraint to identify candidate mutations. Conservation 
can help us resolve amid multiple SNPs the ones that disrupt conserved 
functional elements and are likely to have regulatory roles. In this example, a 
SNP (rs6504340) associated with tooth development is strongly linked to a 
conserved intergenic SNP, rs8073963, 7.1 kb away, which disrupts a deeply 
conserved Forkhead-family motif in a strong enhancer. Although the SNPs 
shown here stem from GWAS on HapMap data, the same principle should be 
applicable to associated variants detected by resequencing the region of interest. 


states (1.5-2-fold), together covering 7.1% of the unexplained elements 
at 2.1-fold enrichment. In the nine cell types, enriched promoter, 
enhancer and insulator states cover 36% of unexplained elements at 
~1.75-fold enrichment, with locations active in multiple cell types 
showing even stronger enrichment (Supplementary Fig. 20). 

Overall, chromatin states indicate possible functions (at 1.74-fold 
enrichment) for 37.5% (N = 987,985) of unexplained conserved ele- 
ments (27% of all conserved elements), suggesting meaningful asso- 
ciation for at least 16% of unexplained constrained bases. Although 
current experiments only provide nucleosome-scale (~200-bp) reso- 
lution, we expect higher-resolution experimental assays that more 
precisely pinpoint regulatory regions to show further increases in 
enrichment. The increased overlap observed with additional cell types 
suggests that new cell types will help elucidate additional elements. Of 
course, further experimental tests will be required to validate the 
predicted functional roles. 


Accounting for constrained elements 

Overall, ~30% of constrained elements overlap were associated with 
protein-coding transcripts, ~27% overlap specific enriched 
chromatin states, ~1.5% novel RNA structures, and ~3% conserved 
regulatory motif instances (Supplementary Text 13, 14). Together, 
~60% of constrained elements overlap one of these features, with 
enrichments ranging from 1.75-fold for chromatin states (compared 
to unannotated regions) up to 17-fold for protein-coding exons (com- 
pared to the whole genome). 


Implications for interpreting disease-associated variants 

In the non-protein-coding genome, SNPs associated with human 
diseases in genome-wide association studies are 1.37-fold enriched 
for constrained regions, relative to HapMap SNPs (Supplementary 
Text 15 and Supplementary Table 8). This is notable because only a 
small proportion of the associated SNPs are likely to be causative, 
whereas the rest are merely in linkage disequilibrium with causative 
variants. 

Accordingly, constrained elements should be valuable in focusing 
the search for causative variants among multiple variants in linkage 
disequilibrium. For example, in an intergenic region between HOXB1 
and HOXB2 associated with tooth development phenotypes”, the 
reported SNP (rs6504340) is not conserved, but a linked SNP 
(rs8073963) sits in a constrained element 7.1 kb away. Moreover, 
rs8073963 disrupts a deeply conserved FOXO2 motif instance within 
a predicted enhancer (Fig. 4), making it a candidate mutation for 
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further follow-up. Similar examples of candidate causal variants are 
found for diverse phenotypes such as height or multiple sclerosis, and 
similar analyses could be applied to case-control resequencing data. 


Evolution of constrained elements 

We next sought to identify signatures of positive selection that may 
accompany functional adaptations of different species to diverse 
environments and new ecosystems. 


Codon-specific selection 

We used the ratio dN/dS of non-synonymous to synonymous codon 
substitutions as evidence of positive selection (>1) or negative selec- 
tion (<1). Although dN/dS is typically calculated for whole genes, the 
additional mammals sequenced enabled analysis at the codon level: 
simulations predicted a 250-fold gain in sensitivity compared to 
HMRD, identifying 53% of positive sites at 5% FDR (Supplemen- 
tary Text 16). 

Applying this test to 6.05 million codons in 12,871 gene trees, we 
found evidence of strong purifying selection (dN/dS < 0.5) for 84.2% 
of codons and positive selection (dN/dS > 1.5) for 2.4% of codons 
(with 94.1% of sites <1 and 5.9% >1; Supplementary Table 9). At 5% 
FDR, we found 15,383 positively selected sites in 4,431 proteins. The 
genes fall into three classes based on the distribution of selective 
constraint: 84.8% of genes show uniformly high purifying selection, 
8.9% show distributed positive selection across their length and 6.3% 
show localized positive selection concentrated in small clusters 
(Fig. 3b and Supplementary Fig. 21, Supplementary Tables 10 and 11). 

Genes with distributed positive selection were enriched in such 
functional categories as immune response (Pgonr< 10 1°) and taste 
perception (Pgonr< 10 *°), which are known to evolve rapidly, but 
also in some unexpected functions such as meiotic chromosome 
segregation (Pponr < 10 7°) and DNA-dependent regulation of tran- 
scription (Ppon¢< 10 19. Supplementary Table 12). Localized positive 
selection was enriched in core biochemical processes, including micro- 
tubule-based movement (Ppon¢< 10° °°), DNA topological change 
(Pponf< 10 *) and telomere maintenance (Ppont< 7 X 10°), sug- 
gesting adaptation at important functional sites. 

Focusing on 451 unique Pfam protein-domain annotations, we 
found abundant purifying selection, with 225 domains showing puri- 
fying selection for >75% of their sites, and 447 domains showing 
negative selection for >50% of their sites (Supplementary Table 13). 
Domains with substantial fractions of positively selected sites include 
CRAL/TRIO involved in retinal binding (2.6%), proteinase-inhibitor- 
cystatin involved in bone remodelling (2.2%) and the secretion-related 
EMP24/GOLD/P24 family (1.6%). 


Exaptation of mobile elements 

Mobile elements provide an elegant mechanism for distributing a 
common sequence across the genome, which can then be retained 
in locations where it confers advantageous regulatory functions to the 
host—a process termed exaptation. Our data revealed >280,000 
mobile element exaptations common to mammalian genomes cover- 
ing ~7 Mb (Supplementary Text 17), a considerable expansion from 
the ~10,000 previously recognized cases“. Of the ~1.1 million con- 
strained elements that arose during the 90 million years between the 
divergence from marsupials and the eutherian radiation, we can trace 
>19% to mobile element exaptations. Often only a small fraction 
(median ~ 11%) of each mobile element is constrained, in some cases 
matching known regulatory motifs. Recent exaptations are generally 
found near ancestral regulatory elements, except in gene deserts, 
which are abundant in ancestral elements but show few recent exap- 
tations (P< 10 °°, Supplementary Fig. 22). 


Accelerated evolution in the primate lineage 
Lineage-specific rapid evolution in ancestrally constrained elements 
previously revealed human positive selection associated with brain 
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and limb development**. Applying this signature to the human and 


primate lineages, we identified 563 human-accelerated regions 
(HARs) and 577 primate-accelerated regions (PARs) at FDR <10% 
(Supplementary Text 18, Supplementary Tables 14 and 15), signifi- 
cantly expanding the 202 previously known HARs”. Fifty-four HARs 
(9.4%) and 49 PARs (8.5%) overlap enhancer-associated chromatin 
marks and experimentally validated enhancers (Supplementary Text 
18). Substitution patterns in HARs suggest that GC-biased gene con- 
version (BGC) is not responsible for the accelerated evolution in the 
vast majority of these regions (~15% show evidence of BGC). 
Genes harbouring or neighbouring HARs and PARs are enriched 
for extracellular signalling, receptor activity, immunity, axon guidance, 
cartilage development and embryonic pattern specification (Sup- 
plementary Fig. 23). For example, the FGF13 locus associated with 
an X-linked form of mental retardation contains four HARs near the 
5’ ends of alternatively spliced isoforms of FGF13 expressed in the 
nervous system, epithelial tissues and tumours, suggesting human- 
specific changes in isoform regulation (Supplementary Fig. 24). 


Discussion 


Comparative analysis of 29 mammalian genomes reveals a high- 
resolution map of >3.5 million constrained elements that encompass 
~4% of the human genome and suggest potential functional classes 
for ~60% of the constrained bases; the remaining 40% show no 
overlap and remain uncharacterized. We report previously undetec- 
ted exons and overlapping functional elements within protein-coding 
sequence, new classes of RNA structures, promoter conservation pro- 
files and predicted targets of transcriptional regulators. We also pro- 
vide evidence of evolutionary innovation, including codon-specific 
positive selection, mobile element exaptation and accelerated evolu- 
tion in the primate and human lineages. 

By focusing our comparison on only eutherian mammals, we dis- 
cover functional elements relevant to this clade, including recent 
eutherian innovations. This is especially important for discovering 
regulatory elements, which can be subject to rapid turnover’. 
Indeed, a previous comparison indicated that only 80% of 50-bp 
non-coding elements are shared with opossum“, and the current 
12-bp analysis shows ~64% of non-coding elements shared with 
opossum, and only 6% with stickleback fish. Many eutherian elements 
are thus probably missing from previous maps of vertebrate 
constraint”. 

Sequencing of additional species should enable discovery of lineage- 
specific elements within mammalian clades, and provide increased 
resolution for shared mammalian constraint. We estimate that 100- 
200 eutherian mammals (15-25 neutral substitutions per site) will 
enable single-nucleotide resolution. The majority of this branch length 
is present within the Laurasiatherian and Euarchontoglire branches, 
which also contain multiple model organisms. These are ideal next 
targets for sequencing as part of the Genome 10K effort*’, aiming to 
sequence 10,000 vertebrate species. Within the primate clade, a branch 
length of ~1.5 could probably be achieved, enabling primate-specific 
selection studies, albeit at lower resolution. Lastly, human-specific 
selection should be detectable by combining data across genomic 
regions and by comparing thousands of humans”. 

The constrained elements reported here can be used to prioritize 
disease-associated variants for subsequent study, providing a power- 
ful lens for elucidating functional elements in the human genome 
complementary to ongoing large-scale experimental endeavours such 
as ENCODE and Roadmap Epigenomics. Experimental studies 
require prior knowledge of the biochemical activity sought and reveal 
regions active in specific cell types and conditions. Comparative 
approaches provide an unbiased catalogue of shared functional 
regions independent of biochemical activity or condition, and thus 
can capture experimentally intractable or rare activity patterns. With 
increasing branch length, they can provide information on ancestral 
and recent selective pressures across mammalian clades and within 
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the human population. Ultimately, the combination of disease 
genetics, comparative and population genomics and biochemical 
studies have important implications for understanding human bio- 
logy, health and disease. 


METHODS SUMMARY 


A full description of materials and methods, including sample selection and 
sequencing strategy, assembly strategies and results, error estimation and correc- 
tion, alignment details, estimation of genome portion under constraint, detection 
of constrained elements, mammalian constraint versus human polymorphism, 
protein coding genes, detection of stop codon readthrough and synonymous 
constraint elements, RNA structure detection, patterns of promoter constraint, 
regulatory motif discovery, correlation with chromatin state information, overall 
accounting of constraint elements, comparison with disease-associated variants, 
detection of codon-specific positive selection, exaptation of ancestral repeat ele- 
ments, and human and primate accelerated regions is available in Supplementary 
Information. All animal experiments were approved by the MIT Committee for 
Animal Care. 
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Spatio-temporal transcriptome of the 


human brain 
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Brain development and function depend on the precise regulation of gene expression. However, our understanding of the 
complexity and dynamics of the transcriptome of the human brain is incomplete. Here we report the generation and 
analysis of exon-level transcriptome and associated genotyping data, representing males and females of different 
ethnicities, from multiple brain regions and neocortical areas of developing and adult post-mortem human brains. We 
found that 86 per cent of the genes analysed were expressed, and that 90 per cent of these were differentially regulated at 
the whole-transcript or exon level across brain regions and/or time. The majority of these spatio-temporal differences 
were detected before birth, with subsequent increases in the similarity among regional transcriptomes. The transcriptome 
is organized into distinct co-expression networks, and shows sex- biased gene expression and exon usage. We also profiled 
trajectories of genes associated with neurobiological categories and diseases, and identified associations between single 
nucleotide polymorphisms and gene expression. This study provides a comprehensive data set on the human brain 
transcriptome and insights into the transcriptional foundations of human neurodevelopment. 


Human neurodevelopment is a complex and precisely regulated pro- 
cess that occurs over a protracted period of time’*. Human-specific 
features of this process are likely to be important factors in the evolu- 
tion of human specializations*°. However, in addition to giving us 
remarkable cognitive and motor abilities, the formation of molecularly 
distinct and intricate neural circuits may have also increased our sus- 
ceptibility to certain psychiatric and neurological disorders*® 
Furthermore, sex differences are important in brain development 
and function, and are a risk factor for conditions such as autism spec- 
trum disorders (ASDs) and depression’ **. Research and progress in all 
these areas could be enhanced by a comprehensive analysis of the 
spatio-temporal dynamics of gene expression and transcript variants 
in the human brain. 

Previous transcriptome studies of the developing human brain 
have used relatively small numbers of samples and predominantly 
focused on only a few regions or developmental time points’*"*. In 
this Article, we explore the transcriptomes of 16 regions comprising 
the cerebellar cortex, mediodorsal nucleus of the thalamus, striatum, 
amygdala, hippocampus and 11 areas of the neocortex. The data set 
was generated from 1,340 tissue samples collected from 57 developing 
and adult post-mortem brains of clinically unremarkable donors 
representing males and females of multiple ethnicities. 


Study design, data generation and quality control 

To investigate the spatio-temporal dynamics of the human brain 
transcriptome, we created a 15-period system spanning the periods 
from embryonic development to late adulthood (Table 1 and Sup- 
plementary Information, section 2.1). We sampled transient prenatal 


structures and immature and mature forms of 16 brain regions, 
including 11 neocortex (NCX) areas, from multiple specimens per 
period (Table 2; Supplementary Information, section 2.2; Sup- 
plementary Figs 1-3; and Supplementary Table 1). The 11 NCX areas 
are collectively referred to hereafter as the region NCX. We also 
genotyped donor DNA using an Illumina 2.5-million single nucleotide 
polymorphism (SNP) chip (Supplementary Fig. 4 and Supplementary 
Table 2). Only brains from clinically unremarkable donors with no signs 
of large-scale genomic abnormalities were included in the study (N = 57, 
including 39 with both hemispheres; age, 5.7 weeks post-conception 


Table 1 | Periods of human development and adulthood as defined 
in this study 


Period Description Age 

il Embryonic 4PCW <Age <8 PCW 
2 Early fetal 8 PCW <Age < 10 PCW 
3 Early fetal 10 PCW SAge < 13 PCW 
4 Early mid-fetal 13 PCW SAge < 16 PCW 
5 Early mid-fetal 16 PCW SAge < 19 PCW 
6 Late mid-fetal 19 PCW SAge < 24 PCW 
7 Late fetal 24 PCW Age <38 PCW 
8 Neonatal and early infancy OM (birth) sAge <6M 
9 Late infancy 6MsAge<12M 

10 Early childhood 1Y <Age <6Y 

a Middle and late childhood 6Y<Age<12Y 

12 Adolescence 12Y<Age<20Y 

13 Young adulthood 20Y <Age<40Y 
14 Middle adulthood 40Y <Age<60Y 

15 Late adulthood 60 Y <Age 


M, postnatal months; PCW, post-conceptional weeks; Y, postnatal years. 
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Table 2 | Ontology and nomenclature of analysed brain regions and 
NCX areas 


Periods 1 and 2 


Periods 3-15 


OFC, orbital prefrontal cortex 
DFC, dorsolateral prefrontal cortex 
VFC, ventrolateral prefrontal cortex 

MFC, medial prefrontal cortex 

MIC, primary motor (M1) cortex 


FC, frontal cerebral wall 


S1C, primary somatosensory (S1) cortex 
IPC, posterior inferior parietal cortex 


PC, parietal cerebral wall 


A1C, primary auditory (A1) cortex 
STC, superior temporal cortex 
ITC, inferior temporal cortex 


TC, temporal cerebral wall 


OC, occipital cerebral wall VIC, primary visual (V1) cortex 


HIP, hippocampal anlage HIP, hippocampus 


AMY, amygdala 
STR, striatum 


VF, ventral forebrain 
MGE, medial ganglionic eminence 
LGE, lateral ganglionic eminence 
CGE, caudal ganglionic eminence 


MD, mediodorsal nucleus of the thalamus 


DIE, diencephalon 
DTH, dorsal thalamus 


URL, upper (rostral) rhombic lip CBC, cerebellar cortex 


(PCW) to 82 years; sex, 31 males and 26 females; post-mortem interval, 
12.11 + 8.63 (mean = s.d.) hours; pH, 6.45 + 0.34 (mean = s.d.)). 

Transcriptome profiling was performed using total RNA extracted 
from a total of 1,340 dissected tissue samples (RNA integrity number, 
8.83 + 0.93 (mean = s.d.); Supplementary Tables 3 and 4). We used the 
Affymetrix GeneChip Human Exon 1.0 ST Array platform, which fea- 
tures comprehensive coverage of the human genome, with 1.4 million 
probe sets that assay expression across the entire transcript or indi- 
vidual exon, thereby providing redundancy and increased confidence 
in estimates of gene-level differential expression (DEX, differentially 
expressed) and differential exon usage (DEU). Descriptions of tissue 
sampling and quality control measures implemented throughout tran- 
scriptome data generation steps are provided in Supplementary 
Information, sections 2-5, and Supplementary Figs 5-8. 


Global transcriptome dynamics 

Spatio-temporal gene expression 

After quality control assessments and quantile normalization, we 
summarized core and unique probe sets, representing 17,565 mainly 
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Figure 1 | Global spatio-temporal dynamics of gene expression. a, Venn 
diagrams representing the total number of genes considered to be expressed 
and the number of spatially and temporally DEX genes for brain regions (top) 
and NCX areas (bottom). b, Multidimensional scaling (MDS) plot showing 
transcriptional similarity, coloured by period (top) and region (bottom). Non- 
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protein-coding genes, into gene-level information. Using stringent 
criteria (log,-transformed signal intensity of =6 in at least one sample 
and mean detection-above-background P value of <0.01 in at least 
one region of at least one period) to define an ‘expressed’ gene, we 
found that 15,132 (86.1%) of 17,565 genes surveyed were expressed in 
at least one brain region during at least one period, and that 14,375 
(81.8%) were expressed in at least one NCX area (Fig. 1a, Supplemen- 
tary Information 6.1 and Supplementary Fig. 9). To investigate the 
contributions of different factors to the global transcriptome 
dynamics, we applied multidimensional scaling and _principal- 
component analysis, which revealed that region and age (that is, 
spatio-temporal dynamics) contribute more to the global differences 
in gene expression than do other tested variables: sex, ethnicity and 
inter-individual variation (Fig. 1b; Supplementary Information, 
sections 6.2 and 6.3; and Supplementary Figs 10 and 11). 

To identify genes that were spatially or temporally regulated, we used 
a conservative threshold (false-discovery-rate Q value of <0.01 and 
=22-fold logs-transformed signal intensity difference), included post- 
mortem interval and RNA integrity number as technical covariates 
within all of our analysis-of-variance models of differential expression, 
considered the influence of dissection variation and applied a fivefold 
jackknife procedure (Supplementary Information, section 6.4, and 
Supplementary Figs 12 and 13). We found that 70.9% of expressed 
genes were spatially DEX between any two regions within at least 
one period, and that 24.1% were spatially DEX between any two 
NCX areas (Fig. 1a). By contrast, 89.9% of expressed genes were tem- 
porally DEX between any two periods across regions, and 85.3% were 
temporally DEX between any two periods across NCX areas. Moreover, 
70.0% and 23.9% of expressed genes were both spatially and temporally 
DEX within brain regions and within NCX areas, respectively. The bulk 
of spatio-temporal regulation occurred during prenatal development. 
For instance, 57.7% of NCX-expressed genes were temporally DEX 
across fetal development (periods 3-7), whereas 9.1% were during 
postnatal development (periods 8-12) and 0.7% were during adulthood 
(periods 13-15). Together, these data indicate that the majority of 
brain-expressed protein-coding genes are temporally and, to a lesser 
extent, spatially regulated, and that this regulation occurs predomi- 
nately during prenatal development. 


Transcriptional architecture of the human brain 
To assess transcriptional relatedness between brain regions/NCX 
areas, we calculated correlation matrices of pairwise comparisons 
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metric; stress = 18.9%. Euclidean distance of log,-transformed signal intensity 
was used to measure pairwise similarity. c, Heat map matrix of pairwise 
Spearman correlations between brain regions (top) and between NCX areas 
(bottom) during fetal development (periods 3-7), postnatal development 
(periods 8-12) and adulthood (periods 13-15). 
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(Fig. 1c) and performed unsupervised hierarchical clustering across 
periods 3-15, an interval during which all analysed regions/areas can 
be consistently followed across time. Among regions, this analysis 
showed distinct and developmentally regulated clustering of NCX 
(combination of 11 areas), HIP and AMY, with CBC having the most 
distinctive transcriptional profile. At the level of NCX areas, cluster- 
ing formed the following groups during fetal periods: OFC, DFC and 
MFC; VFC and primary somatomotor cortex (S1C and M1C); and 
parietal-temporal perisylvian areas (IPC, AlC and STC). V1C had the 
most distinctive transcriptional profile of NCX areas throughout 
development and adulthood. The increased correlations between 
NCX, HIP, AMY and the majority of non-V1C NCX areas with age 
indicate that transcriptional differences are particularly pronounced 
during development. 

Consistent with the clustering observed, CBC showed the greatest 
number of region-restricted or region-enriched DEX genes, with 516 
(4.8%) of 10,729 genes spatially DEX (Supplementary Information, 
section 6.4, and Supplementary Table 5). By contrast, the numbers of 
genes highly enriched in the other regions were lower: NCX, 46 
(0.43%); HIP, 48 (0.45%); AMY, 4 (0.04%); STR, 137 (1.28%); MD, 
216 (2.01%). The majority of these spatially enriched genes were also 
temporally regulated, and some, such as those in Supplementary Figs 
14 and 15 (NCX: FLJ32063, KCNS1; HIP: CDC20B, METTL7B; AMY: 
TFAP2D, UTS2D; STR: Cl0orfll, PTPN7; MD: CEACAM21, 
SLC24A5; CBC: ESRRB, ZP2), were transiently enriched during a 
narrow time window. These clustering and region-enrichment results 
reveal that regional transcriptomes are developmentally regulated and 
reflect anatomical differences. 


Spatio-temporal differential exon usage 
Alternative exon usage is an important mechanism for generating 
transcript diversity'’’’®. Using a splicing analysis of variance and a 
splicing index algorithm with conservative criteria (Q< 0.01 with a 
minimum twofold splice index difference between at least two 
regions/areas or periods; Supplementary Information, section 6.5), 
we found that 13,647 (90.2%) of 15,132 expressed genes showed 
DEU across sampled regions (0.1%), periods (19.5%) or both 
(70.6%). Of 14,375 NCX genes, 88.7% showed DEU across sampled 
areas (<0.01%), periods (59.8%) or both (28.9%). The regulation of 
DEU also varied in time, with the majority of expressed genes (83.0%) 
showing temporal DEU across fetal development, whereas only 0.9% 
and 1.4% were temporally regulated across postnatal development 
and adulthood, respectively. 

Focusing on ANKRD32, a gene we have previously shown to express 
an alternative variant in the late mid-fetal frontal cortex!®, we 
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confirmed and extended our findings on DEU by showing that whereas 
the longer isoform (ANKRD32a) was equally expressed across fetal 
NCX areas, the shorter isoform (ANKRD32b), comprising the last 
three exons, exhibited dynamic areal patterns. ANKRD32b was tran- 
siently expressed in a gradient along the anterior—posterior axis of the 
mid-fetal frontal cortex, with the highest expression in OFC and the 
lowest in MIC. Before this, ANKRD32b was most highly enriched in 
the ITC and, toa lesser extent, the STC. These spatio-temporal patterns 
disappeared after birth, when only ANKRD32a was expressed, and 
were not observed in the mouse NCX of equivalent ages (Sup- 
plementary Fig. 16 and Supplementary Table 6). These findings illus- 
trate the complexity of DEU in the human brain and demonstrate 
how specific alternative transcripts can be spatially restricted during 
a narrow developmental window and with interspecies differences. 


Sex differences in the transcriptome 

Sex-biased gene expression 

Previous studies have identified sexually dimorphic gene expression 
in the developing and adult human brain’’. Analysis of our data set 
using a sliding-window algorithm and t-test model (Q< 0.01 with 
>2-fold difference in log,-transformed signal intensity; Supplemen- 
tary Information, section 6.6) identified 159 genes, including a number 
of previously reported and newly uncovered genes with male or female 
bias in expression located on the Y (13 genes), X (9 genes) and 
autosomal (137 genes) chromosomes. A large fraction (76.7%) had 
male-biased expression (Fig. 2a and Supplementary Table 7). Notable 
spatial differences were observed, and more genes had sex-biased 
expression during prenatal development than during postnatal life, 
with the adult brain characterized by having the fewest. 

Consistent with previous findings'*"*, we found that the largest differ- 
ences were attributable to Y-chromosome genes, especially PCDH11Y, 
RPS4Y1, USP9Y, DDX3Y, NLGN4Y, UTY, EIFIAY and ZFY, which 
showed constant expression across regions and periods, with the excep- 
tion of PCDH11Y downregulation in the postnatal CBC (Fig. 2b). 
Notably, the functional homologues of these genes on the X chro- 
mosome, barring ZFX during fetal development (PCDH11X, RPS4X, 
USP9X, DDX3X, NLGN4X, UTX and EIFIAX), were not upregulated 
in a compensatory manner in female brains (Supplementary Fig. 17). 

We also found other X-linked and autosomal genes with sex-biased 
expression and distinct spatio-temporal patterns, including functionally 
uncharacterized transcripts (LOC554203, C3orf62, FLJ35409 (also 
known as MIR137HG) and DKFZP58611420), S100A10 (which has been 
linked to depression’) and IGF2 (an imprinted autosomal gene previ- 
ously implicated in embryonic growth and cognitive function’*”’), that 
showed population-level male-biased expression (Fig. 2c). 


Figure 2 | Sex-biased gene expression. 

a, Number of sex-biased DEX genes in brain 
regions/NCX areas during fetal development 
(periods 3-7), postnatal development (periods 
8-12) and adulthood (periods 13-15). 

b, PCDH11Y exon array signal intensity (left) and 
validation by quantitative PCR with reverse 
transcription (qRT-PCR; right) (N = 5 male brains 
per period). c, IGF2 exon array signal intensity 
(left) and qRT-PCR (right) validation in NCX 

(N = 4 per sex and period). P values were 
calculated by unpaired t-test. Whiskers indicate 
fifth and ninety-fifth percentiles, respectively. NS, 
not significant. 
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Sex-biased exon usage 

We next explored sex-biased DEU using a sliding-window algorithm 
with a splicing t-test model (Q< 0.01 and splicing index >2; Sup- 
plementary Information, section 6.6). We identified 155 genes (145 
autosomal) that showed sex-biased expression of probe sets encoding 
one or a subset of exons (Supplementary Table 8) in one or multiple 
regions/NCX areas. These included several members of the collagen 
family of genes (COLIA1, COLIA2, COL3A1, COL5A2 and COL6A3), 
C3, KCNH2 (a gene associated with schizophrenia), NOTCH3 (a 
gene mutated in a common form of hereditary stroke disorder’’), 
ELN (a gene located within the Williams syndrome critical region’®) 
and NLGN4X (an X-chromosome gene implicated in synapse func- 
tion and associated with ASD and moderate X-linked intellectual 
disability’®’’). Although comparably expressed in males and females 
at the population and gene levels (Supplementary Fig. 17), NLGN4X 
hada significant male bias in expression of exon 7 and, toa lesser extent, 
exons 1, 5 and 6 in a developmentally regulated manner (Fig. 3). 
Together, these findings show that developmentally and spatially regu- 
lated differences in gene- and exon-level expression exist between male 
and female brains. 


Gene co-expression networks 

To extract additional biological information embedded in the multi- 
dimensional transcriptome data set, we performed weighted gene co- 
expression network analysis”, which allowed us to identify modules 
of co-expressed genes. We identified 29 modules associated with 
distinct spatio-temporal expression patterns and biological processes 
(Fig. 4a; Supplementary Information, section 6.7; Supplementary 
Tables 9-11; and Supplementary Figs 18-20). Among modules cor- 
responding to specific spatio-temporal patterns, M8 consisted of 24 
genes with a common developmental trend that showed the highest 
expression levels in early fetal NCX and HIP (period 3), followed by a 
progressive decline in expression levels with age until infancy (period 
9) (Fig. 4b). By contrast, M15 contained 310 genes showing changes in 
the opposite direction (relative to those in M8) in the NCX, HIP, AMY 
and STR (Fig. 4c). Gene ontology enrichment analysis showed that 
genes in M8 were enriched for gene ontology categories related to 
neuronal differentiation (Bonferroni-adjusted P = 7.7 X 10 *) and 
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Figure 3 | Sex-biased differential exon usage. a, Gene structure and probe set 
composition of NLGN4X. Yellow and green arrows depict primers used for 
qRT-PCR validation. b, Heat map of the log, male/female signal intensity ratio 
of each exon for fetal development (periods 3-7), postnatal development 
(periods 8-12) and adulthood (periods 13-15). Differences in expression of 
exon 7 (yellow frame) and the 3’ untranslated region (UTR; green frame) in 
adult NCX are highlighted. Note that exons 2 and 3A did not meet our 
expression criteria and are not represented. c, (RT-PCR validation of 
expression of exon 7 and the 3’ UTR in adult NCX (N = 4 per sex). P values 
were calculated by unpaired t-test. Whiskers indicate fifth and ninety-fifth 
percentiles, respectively. 
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transcription factors (P = 5.2 X 10-*) (Supplementary Information, 
section 6.8, and Supplementary Table 9). Conversely, M15 gene onto- 
logy categories included ionic channels (P = 8.0 x 10 8) and neuro- 
active ligand-receptor interaction (P = 4.0 x 10"). 

Genes with the highest degree of connectivity within a module are 
termed hub genes and are expected to be functionally important within 
the module. M8 hub genes included transcription factors TBR1, FEZF2, 
FOXG1, SATB2, NEUROD6 and EMX1 (Fig. 4b), which have been 
functionally implicated in the development of NCX and HIP projection 
neurons” **. Furthermore, FOXG1 variants have been linked to Rett 
syndrome and intellectual disability**. Sequence variants in M15 hub 
genes (Fig. 4c) have been linked to major depression*®” (GDA) and to 
schizophrenia and affective disorders®**® (NRGN and RGS4). 

We also identified two large-scale, temporally regulated modules 
(M20 and M2) with opposite developmental trajectories of genes co- 
expressed across regions: expression in M20 gradually decreased with 
age and expression in M2 gradually increased (Supplementary Figs 21 
and 22). M20 was enriched for gene ontology categories related to 
zinc-finger proteins (P=7.3 X10 **) and transcription factors 
(P=4.8 X10 *°), including many ZNF and SOX family members. 
M2 was enriched for gene ontology categories related to membrane 
proteins (P=1.8 x 10 *'), calcium signalling (P= 8.1 x 107 1°), 
synaptic transmission (P= 1.6 X 10 °) and neuroactive ligand- 
receptor interaction (P = 4.1 X 10” *), reflecting processes important 
in postnatal brain maturation. Their hub genes encoded transcrip- 
tional factors, modulators of chromatin state and signal transduction 
proteins, all of which are likely to be involved in driving the co- 
expression networks. Drastic expression shifts in M20 and M2 in 
the opposite direction just before birth indicate that this period is 
associated with global transcriptional changes that probably reflect 
environmental influences on brain development and intrinsic 
changes in cellular composition and functional processes. 


Expression trajectories of neurodevelopment 


One important use for the generated data set is to gain insight into 
normal and abnormal human neurodevelopment by analysing 
trajectories of individual genes or groups of genes associated with a 
particular neurobiological category or disease. To test this strategy, we 
compared our expression data for DCX (a gene expressed in neuronal 
progenitor cells and immature migrating neurons), as well as for genes 
associated with dendrite (MAP1A, MAPT, CAMK2A) and synapse 
(SYP, SYPL1, SYPL2, SYN1) development, with independently generated, 
non-transcriptome human data sets. The DCX expression trajectory was 
remarkably reminiscent of the reported changes in the density of DCX- 
immunopositive cells in the postnatal human HIP***® (r= 0.946, 
Pearson correlation; Fig. 5a). In our transcriptome data set, DCX expres- 
sion increased until early mid-fetal development (period 5) and then 
gradually declined with age until early childhood (period 10). Likewise, 
expression trajectories of dendrite and synapse development gene 
groups closely paralleled the growth of basal dendrites of DFC pyramidal 
neurons” (r = 0.810 for layer 3 and r = 0.700 for layer 5; Fig. 5b) and 
DFC synaptogenesis” (r = 0.940; Fig. 5c), respectively. Steep increases in 
both processes occurred between the late mid-fetal period and late 
infancy, indicating that a considerable portion of these two processes 
occurs before birth and reaches a plateau around late infancy. 

After demonstrating the accuracy and viability of using the data set 
to profile human neurodevelopment, we manually curated lists of 
genes associated with over 80 categories, including various neuro- 
developmental processes, neural cell types and neurotransmitter 
systems (Supplementary Information 6.9 and Supplementary Table 
12). Notable trajectories and differences in their onset times, rates of 
increase and decrease, and shapes were observed within and between 
brain regions for categories including major neurodevelopmental 
processes (neural cell proliferation and migration, dendrite and 
synapse development, and myelination; Fig. 5d), cortical GABAergic 
inhibitory interneurons (CALB1, CALB2, NOS1, PVALB and VIP) and 
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Figure 4 | Global co-expression networks and gene modules. a, Dendrogram 
from gene co-expression network analysis of samples from periods 3-15. 
Modules of co-expressed genes were assigned a colour and number (M1 to 
M29). b, Left: heat map of genes in M8 showing the spatio-temporal expression 
pattern after hierarchical clustering. The expression values for each gene are 
arranged in the heat map, ordered first by brain region, then by age and last by 


glutamate receptors (Supplementary Figs 23 and 24). Two expected 
patterns were observed in neurodevelopmental trajectories: changes in 
expression of cell proliferation genes preceded the increase in expres- 
sion of DCX, and expression of each decreased during perinatal 
development whereas synapse development, dendrite development 
and myelination trajectories increased. Notably, the NCX trajectory 
for synapse development did not drastically decline during late child- 
hood or adolescence (Fig. 5c, d) as previously reported for synapse 
density”. We also identified co-expression network modules and addi- 
tional genes that are highly correlated with the categories 
(Supplementary Tables 10, 13 and 14). For example, M20 and M2 were 
strongly correlated with neuron migration (r = 0.894) and myelination 
(r = 0.972), respectively. 

In addition, our data set enabled us to generate expression trajectories 
of genes commonly associated with ASD and schizophrenia. We inves- 
tigated a number of genes previously linked to these disorders (Sup- 
plementary Information, section 6.10) and observed distinct and 
dynamic expression patterns, especially among NCX areas (Sup- 
plementary Fig. 25 shows examples for CNTNAP2, MET, NLGN4X 
and NRGN). To gain insight into potential biological functions of ASD- 
and schizophrenia-associated genes in human neurodevelopment, we 
identified other genes with significantly correlated spatio-temporal 
expression profiles and performed gene ontology enrichment analysis 
(Supplementary Tables 15 and 16). These findings reveal associated 
spatio-temporal differences in these expression trajectories and pro- 
vide additional co-expressed genes that can be interrogated for their 
role in the respective processes or disorders. 
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Expression quantitative trait loci 
Previous studies have identified expression quantitative trait loci 


(eQTLs) in the adult human brain, primarily in the cerebral cortex*~”’. 
Our multiregional developmental data set enabled us to search for 
association between SNP genotypes and spatio-temporal gene expres- 
sion. We tested only for cis-eQTLs, restricting the search to SNPs 
within 10 kilobases of either a transcription start site or a transcription 
end site, as opposed to trans-eQTLs, which would require much larger 
sample sizes. 

Implementing a conservative strategy (gene-wide Bonferroni cor- 
rection followed by genome-wide Q < 0.1; Supplementary Informa- 
tion, section 9), we identified 39 NCX, eight HIP, four AMY, two STR, 
six MD and five CBC genes (Supplementary Table 17) with evidence of 
cis-eQTL, including two previously reported genes**” (ITGB3BP and 
ANKRD27). Consistent with previous studies**, associated SNPs were 
enriched near transcription start and termination sites (Fig. 6a, b). 

An example of a significant association in NCX, MD and CBC is 
that between SNP rs10785190 and GLIPR1L2, a member of the glioma 
pathogenesis-related 1 family of genes*’. The expression differences 
were observed at the level of the whole transcript and exons 1 and 2, 
the only exons we observed to be expressed at appreciable levels in the 
NCX (Fig. 6c, d). The NCX probably had more cis-eQTLs than other 
regions owing to its smaller variation in gene expression resulting 
from the averaged expression of 11 areas. Many eQTLs identified as 
significant in NCX also have similar associations in other regions, 
although they were not statistically significant after the conservative 
genome-wide correction (Supplementary Table 17). Thus, we have 
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Figure 5 | Trajectories of genes associated with neurodevelopmental 
processes. a, Comparison between DCX expression in HIP and the density of 
DCX-immunopositive cells in the human dentate gyrus’. b, Comparison 
between transcriptome-based dendrite development trajectory in DFC and 
Golgi-method-based growth of basal dendrites of layer 3 (L3) and 5 (L5) 
pyramidal neurons in the human DFC". c, Comparison between 
transcriptome-based synapse development trajectory in DFC and density of 


identified polymorphic regulators of transcription in different regions 
across development, potentially providing insights into inter- 
individual differences and genetic control of the brain transcriptome. 


Discussion 


Our analysis reveals several features of the human brain transcriptome, 
and increases our knowledge of the transcriptional events in human 
neurodevelopment. We show that gene expression and exon usage have 
complex and dynamically regulated patterns, some of which may not be 
evident in the transcriptomes of commonly studied model organisms. 
Moreover, these patterns differ more prominently across time and space 
than they do between sexes, ethnicities or individuals, despite their 
underlying genetic differences. Transcriptome differences between 
males and females also included several disease-related genes, offering 
possible mechanisms underlying the sex differences in the incidence, 
prevalence and severity of some brain disorders. We also found that 
some of the inter-individual variations in the regional and develop- 
mental transcriptomes were associated with specific SNP genotypes, 
which may have altered expression-regulating elements. Thus, the pre- 
sent data set (available at http://www.humanbraintranscriptome.org), 
along with an accompanying study”, provides a basis for a variety of 
further investigations and comparisons with other transcriptome- 
related data sets of both healthy and diseased states. 

Although our study has uncovered many intricacies in gene expres- 
sion and exon usage in the human brain, there are potential limitations 
of our study that warrant discussion. Foremost, we used stringent 
criteria to minimize false positives and faithfully characterize general 
transcriptional patterns, rather than to capture all the changes that may 
occur. Also, we analysed dissected tissue that contained multiple cell 
types, thus diluting the transcriptional contribution and dynamic 
range of expression of any one specific cell type. Current limitations 
prevent us from using cell-type-specific approaches in systematically 
analysing the spatio-temporal transcriptome. Furthermore, the num- 
ber of brains and regions analysed so far is not sufficient to investigate 
the full magnitude of transcriptional changes or the full range of 
eQTLs. Application of sequencing technology will allow more in-depth 
analyses of the transcriptome, and aid in discovery of novel or 
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DFC synapses calculated using electron microscopy”. For b and c, PC1 for gene 
expression was plotted against age to represent the developmental trajectory of 
genes associated with dendrite (b) or synapse (c) development. Independent 
data sets were centred, scaled and plotted on a logarithmic scale. d, PC1 value 
for the indicated sets of genes (expressed as percentage of maximum) plotted 
against age to represent general trends and regional differences in several 
neurodevelopmental processes in NCX, HIP and CBC. 


low-expressing transcripts. Finally, although specific patterns of 
expression are often linked to specialized biological processes, it is 
important to remember that the relationship between messenger 
RNA and protein levels is not always linear nor translated into appar- 
ent phenotypic differences. As these concerns are addressed in future 
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Figure 6 | Association between SNPs and gene expression. a, b, SNP 
distribution around transcription start sites (TSS; a) and transcription end sites 
(TES; b) of the associated genes, based on several Wald test P-value cut-offs 
after gene-wide Bonferroni correction. c, GLIPR1L2 expression association 
with rs10785190 genotype, a SNP located in exon 1. The solid and dashed 
curves, calculated from locally weighted scatter-plot smoothing (LOWESS), 
show the developmental trends of gene expression and exon-1 and exon-2 
expression, respectively. d, qRT-PCR validation of exon-1 and exon-2 
expression in NCX for each genotype (N = 14 GG, 14 AG, 8 AA), plotted 
relative to the LOWESS curve in ¢ to facilitate comparison across 
developmental periods. P values were calculated by unpaired t-test. Whiskers 
indicate fifth and ninety-fifth percentiles, respectively. 
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with more samples and new data sets from human and non-human 
primate brains, it will be possible to uncover deeper insights into the 
transcriptional foundations of human brain development and evolu- 
tion. 


METHODS SUMMARY 


Supplementary Information, sections 3-9, provides a full description of tissue 
acquisition and processing, data generation, validation and analyses. 
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Kelvin-Helmholtz instabilities as the source of 
inhomogeneous mixing in nova explosions 


Jordi Casanova’?, Jordi José!?, Enrique Garcia-Berro*, Steven N. Shore* & Alan C. Calder® 


Classical novae’” are thermonuclear explosions in binary stellar 
systems containing a white dwarf accreting material from a close 
companion star. They repeatedly eject 10° *-10~* solar masses of 
nucleosynthetically enriched gas into the interstellar medium, 
recurring on intervals of decades to tens of millennia. They are 
probably the main sources™* of Galactic '"N, '’O and °C. The 
origin of the large enhancements and inhomogeneous distribution 
of these species observed in high-resolution spectra’ of ejected 
nova shells has, however, remained unexplained for almost half a 
century®. Several mechanisms’, including mixing by diffusion®, 
shear’ or resonant gravity waves’, have been proposed in the 
framework of one-dimensional or two-dimensional simulations, 
but none has hitherto proven successful because convective mixing 
can only be modelled accurately in three dimensions. Here we 
report the results of a three-dimensional nuclear-hydrodynamic 
simulation of mixing at the core-envelope interface during nova 
outbursts. We show that buoyant fingering drives vortices from the 
Kelvin-Helmholtz instability, which inevitably enriches the accreted 
envelope with material from the outer white-dwarf core. Such mixing 
also naturally produces large-scale chemical inhomogeneities. Both 
the metallicity enhancement and the intrinsic dispersions in the 
abundances are consistent with the observed values. 

High-resolution spectra of nova shells, taken before the ejecta have 
undergone substantial modifications through interactions with the 
stellar companions’”” or with interstellar medium, always show 
highly fragmented, chemically enriched and inhomogeneous shells. 
V1974 Cyg, for instance, although a neon nova, showed large—more 
than threefold—differences in the C/He abundance ratios between two 
knots resolved in the spectrum a few years after outburst’’. A similar 
abundance pattern has also been observed in many other novae, such 
as HR Del 1967 and DQ Her 1934, for which the shells are spatially 
resolved. Comparison of the infrared spectrum with the ultraviolet 
spectrum for the same ions at stages when the former, then the latter, 
turn optically thin shows the same structures to be present even during 
the previous opaque stages”, as do the multiple line systems known for 
decades from optical spectra*””. 

The nova outbursts (whether involving He-rich, CO-rich or ONe- 
rich white dwarfs) are triggered by nuclear processes dominated by 
CNO-cycle reactions that produce '°N, '*!°O and '’F far in excess of 
solar abundances. Convection begins as soon as the temperature gra- 
dient becomes super-adiabatic, powered by the energy released from 
nuclear reactions (driven mainly by proton captures and B* decays’, 
with a main nuclear path running close to the valley of stability), and is 
critical to the explosion, transferring a fraction of these abundant, 
short-lived species to the outer envelope layers. The energy released 
when these nuclei decay lifts degeneracy and drives the expansion, and 
ultimate ejection, of the polluted strata'®. Hence, any attempt to 
explain the peculiarities in the abundance pattern observed in nova 
ejecta requires an accurate model of the physical processes that occur 


during the explosion, namely the nuclear processes during the ther- 
monuclear runaway along with the convective mass and energy trans- 
port. The peak temperatures reached during a nova explosion are 
constrained by the chemical abundance pattern inferred from the 
ejecta and do not exceed 4 X 10°K, so it is unlikely that the observed 
metallicity enhancements can be due to thermonuclear processes 
driven by CNO breakout. Instead, mixing at the core-envelope inter- 
face is the more likely explanation. This cannot be modelled in the one- 
dimensional framework traditionally used in nova nucleosynthesis 
simulations, because mixing is inhibited by the one-dimensional 
mixing-length formalism of convection. 

Early attempts in two-dimensional simulations’? * have shown that 
the onset of convection at the late stages of the thermonuclear runaway, 
driven by shear flows at the core-envelope interface, will ultimately 
dredge up chemically enriched material into the envelope. However, 
two-dimensional approximations for convection are unrealistic’*”: 
the conservation of vorticity imposed by the two-dimensional geo- 
metry forces the small convective cells to merge into large eddies, with 
a size comparable to the pressure scale height of the envelope. In 
contrast, in three-dimensional fully developed turbulent convection, 
eddies will become unstable and consequently will break up and fila- 
ment, transferring their energy to progressively smaller scales**”. 
These structures, vortices and filaments, must undergo a similar fate 
down to roughly the Kolmogorov scale, 4 = (v°/e)"”", where v is the 
kinematic viscosity and ¢ is the energy dissipation rate. Until now there 
has been no indication of such a cascade in nova hydrodynamic simu- 
lations. We show that with sufficient resolution, proper treatment of 
the nuclear processing, and long time spans, this turbulent energy 
transfer occurs and solves the mixing problem. 

We therefore performed three-dimensional simulations of mixing 
at the core-envelope interface during classical nova explosions with 
the multidimensional, Eulerian, explicit code FLASH. The initial 
model’’ consists of a 1Mo CO white dwarf that accretes solar com- 
position matter (Z = 0.02) at a rate of 5 X 10 °Me yr. The model 
was evolved in one dimension and subsequently mapped onto a three- 
dimensional Cartesian grid of 800 < 800 < 800 km?*, when the tem- 
perature at the base of the envelope reached 10°K (see Fig. 1 and 
Supplementary Information). The model was relaxed to guarantee 
hydrostatic equilibrium. A top-hat 2-km-wide temperature perturba- 
tion” (5% amplitude) was imposed close to the core-envelope inter- 
face. As tested thoroughly in two dimensions, the specific choice of the 
initial perturbation (that is, duration, strength, location and size), the 
resolution adopted or the size of the computational domain does not 
have a severe impact on the results of the simulation. The initial per- 
turbation drives a shear flow triggering Kelvin-Helmholtz instabilities 
about 150s after the start of the simulation. Small convection cells 
develop as soon as material is dredged up into the envelope. Moreover, 
the fluid velocity remains below the speed of sound, confirming that a 
nova outburst is driven by a (subsonic) deflagration rather than a 
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Figure 1 | Mixing driven by Kelvin-Helmholtz instabilities. Snapshots of 
the development of Kelvin-Helmholtz instabilities at t= 151s (a), 193s 

(b), 296s (c) and 379s (d), shown in terms of the !*C mass fraction on a 
logarithmic scale. Dredging of core material driven by Kelvin-Helmholtz 
instabilities translates into a mass-averaged abundance of CNO nuclei in the 
envelope of 0.118, 0.129, 0.157 and 0.182, respectively. The mean CNO 
abundance at the end of the simulation reaches 0.20 by mass (see 
Supplementary Movies). Calculations were performed with FLASH”, a 
parallelized explicit Eulerian code, based on the piecewise parabolic 
interpolation of physical quantities for solving the hydrodynamic equations, 
and with adaptive mesh refinement (with four or five levels of refinement). 
Simulations were run at the MareNostrum supercomputer, requiring 150,000 
CPU hours with 256 (occasionally 512) processors. The typical resolution 
adopted was 3.12 X 3.12 X 3.12 km?, with a maximum resolution of 

1.56 X 1.56 X 1.56 km’. The three-dimensional computational domain initially 
comprised 112 radial layers—including the outermost part of the CO core— 
and 512 horizontal layers along both horizontal axes. The mass of the accreted 
envelope was about 2 X 10 °Mo. Nuclear energy generation was handled 
through a network of 13 species CH, “He, 12156, 11415\7 141516179 and 
u7eR). connected through 18 nuclear processes. Periodic conditions were 
imposed at the four vertical boundaries of the computational domain, and 
hydrostatic equilibrium with an outflow constraint at the top and a reflecting 
constraint at the bottom was imposed on the velocity at the horizontal 
boundaries*’. Other details on the input physics are identical to those adopted 
in earlier two-dimensional simulations””’. 


(supersonic) detonation. Even for a point-like ignition, the burning 
front quickly spreads horizontally, such that the expansion and pro- 
gress of the explosion proceed in almost spherical symmetry. This 
confirms early estimates” of the velocity of the deflagration front 
spreading through the stellar surface, in the form Vgep~ (MpVconv/ 
Tourn) 7, Where hy is the pressure scale height, v-ony the characteristic 
convective velocity and Tpurn the characteristic timescale for fuel burn- 
ing. Typical values for nova outbursts yield vjer = 10* cms; that is, a 
flame propagating halfway across the stellar surface in about 1.3 days. 

At t = 400s, matter crosses the outer computational boundary and 
we stop the calculations because of the Eulerian nature of the FLASH 
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code. At this stage, the envelope base has reached a peak temperature 
of 1.82 X 10°K, and the mean (mass-averaged) metallicity in the 
envelope reaches Z ~ 0.20. This agrees with observations of CO novae 
and with previous two-dimensional modelling”, suggesting that the 
dimensionality of the convective treatment is not crucial to the mean 
values but is important for the details (in particular for an accurate 
description of convective transport). The simulations reveal that these 
last stages are, however, intermittent, and our three-dimensional simu- 
lations seem to resolve at least the upper dissipation range (see 
Supplementary Movies). In that sense, the Kolmogorov scale, 1, pro- 
vides an estimate of the size of the smallest eddies present in the flow. 
Also at this stage, the Reynolds number becomes sufficiently small and 
molecular viscosity is effective in dissipating the kinetic energy into heat. 

In our simulations, the burning advances along with the develop- 
ment of persistent density contrasts of large size, comparable to the 
thickness of the layer and much larger than the burning transition 
zone. These become turbulent, and the models require a fully three- 
dimensional treatment to capture the full spectrum of the plumes and 
vortex structures. The resulting abundances from our simulations have 
another particular feature that agrees with and explains the observa- 
tions: the structures are chemically inhomogeneous. This is a relic of 
Rayleigh-Taylor instabilities that grow during the initial stages of the 
ejection. These structures appear within the burning zone (that is, the 
initial plumes) and drive a secondary Kelvin-Helmholtz vortex cascade 
that induces turbulent motions. In this regime of very high Reynolds 
number, the fluid motion is extremely complex and develops structures 
on all scales. In particular, vortices and filaments appear that are a 
signature of the intermittency in classical shear-flow turbulence. 
These are more evident in two-dimensional simulations, but they cascade 
rapidly into smaller eddies and filaments, subject to recombination and 
extension, as the burning continues. Such dissipation is intermittent, as 
predicted in the Kolmogorov-Obukov theory of turbulence**’’, 
generating coherent, persisting structures that advect with the expand- 
ing layers. Because the nuclear reaction rates are density sensitive, 
higher-density knots have a different nuclear history from that of the 
background, and this is best characterized by the abundance distri- 
bution function behind the deflagration. 

This can be clearly seen in Fig. 2, which shows a sample of the time 
evolution of the cumulant function. The initial abundance for O, our 
trace species (whose abundance is increased by the deep non-uniform 
mixing after the onset of thermal buoyant turbulence), is a narrow 
initial distribution that evolves into a stable form with a lower cutoff 
and a power-law tail towards high abundances. Unlike the single value 
obtained in one-dimensional models, we find a 30% dispersion in the 
main component, fitted by a Gaussian (the dashed line in Fig. 2), and 
containing about 10% of the total volume, an extended, non-Gaussian 
‘fat tail’ whose maximum abundance (at the 1% level) extends up to 
130 from the mean '°O for the volume. Multiwavelength spectroscopic 
analyses during the nova nebular stage, when the ejecta are essentially 
transparent, frequently find large dispersions from line to line in the 
abundance ratios of the principal chemical species. Although usually 
assumed to result from measurement uncertainties, which are fre- 
quently lower than the derived dispersions, this may instead be a 
physically significant result: a signature of the turbulence generated 
during the thermonuclear runaway. 

Spatially resolved knots in several classical novae, notably V1974 Cyg 
(Nova Cyg 1992) demonstrate the reality of this inhomogeneity in the 
ejecta, up to threefold, long before the mixtures are diluted by inter- 
action with the interstellar medium. These were first seen in the first 
weeks of the expansion of the ejecta across different spectral regions, 
from the infrared to the ultraviolet, and were invariant in velocity and 
contrast as each layer was exposed, after the opacity drop in the expand- 
ing medium. The contrast between these residual structures will be 
further amplified by the supersonic motions that follow the stage 
shown here. Although some structuring may result from collision of 
the ejecta with the accretion disk around the white dwarf and with the 
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The dwarf planet Eris is a trans-Neptunian object with an orbital 
eccentricity of 0.44, an inclination of 44 degrees and a surface 
composition very similar to that of Pluto’. It resides at present at 
95.7 astronomical units (1 AU is the Earth-Sun distance) from 
Earth, near its aphelion and more than three times farther than 
Pluto. Owing to this great distance, measuring its size or detecting 
a putative atmosphere is difficult. Here we report the observation 
of a multi-chord stellar occultation by Eris on 6 November 2010 ur. 
The event is consistent with a spherical shape for Eris, with radius 
1,163 + 6 kilometres, density 2.52 + 0.05 grams per cm’ and a high 
visible geometric albedo, py =0.96*}03. No nitrogen, argon or 
methane atmospheres are detected with surface pressure larger 
than ~1 nanobar, about 10,000 times more tenuous than Pluto’s 
present atmosphere**. As Pluto’s radius is estimated** to be 
between 1,150 and 1,200 kilometres, Eris appears as a Pluto twin, 
with a bright surface possibly caused by a collapsed atmosphere, 
owing to its cold environment. We anticipate that this atmosphere 
may periodically sublimate as Eris approaches its perihelion, at 
37.8 astronomical units from the Sun. 

The dwarf planet (136199) Eris was discovered? in 2005. Its radius 
has been estimated to be 1,200+100km on the basis of direct 
imaging”, although detection of its thermal flux provided another 
estimate’ of 1,500 + 200 km, potentially making it larger even than 
Pluto, and the largest known dwarf planet. The motion of Dysnomia 
(Eris’ satellite) provides Eris’ mass, Mg = (1.66 + 0.02) X 107 kg, 27% 
larger than Pluto’s mass’’. No short-term (day-scale) brightness vari- 
ability has been detected for Eris at the 1% level’*"’, suggesting either a 
spherical body with no albedo variegation, or—if elongated—a finely- 
tuned, pole-on viewing geometry. The spectrum of Eris is very similar 
to that of Pluto and reveals a methane-ice-rich cover, and another 
dominant ice, presumably nitrogen, but not excluding argon’. 


Stellar occultations by Eris are rare, as it subtends a minuscule 
angular diameter (~0.03 arcsec) while currently moving in severely 
depleted stellar fields at an angular rate of ~1.5arcsech’ ' at most. 
Using the techniques described in ref. 15, we predicted one Eris 
occultation in 2010, on November 6 UT. We attempted observations 
from 26 stations, and the occultation was detected from two sites in 
Chile, with two detections at San Pedro de Atacama (San Pedro for 
short) with the Harlingten and ASH2 telescopes, 20 m from each other, 
and one detection at La Silla, with the TRAPPIST telescope (for details, 
see Fig. 1, Supplementary Figs 1 and 2, and Supplementary Tables 1 
and 2). Another station further south at Complejo Astronomico El 
Leoncito (CASLEO), Argentina, provided a light curve without 
occultation, but went close to Eris’ shadow edge (~200 km; see Fig. 2). 

The San Pedro and La Silla observations provide two occultation 
segments—or ‘chords’—whose four extremities are used to constrain 
Eris’ size (red segments in Fig. 2). When deriving the occultation times, 
it appeared that two equally satisfactory solutions for the star reappear- 
ance time at the Harlingten telescope in San Pedro are possible, yield- 
ing two different chord lengths. These two solutions are separated by 
1.2 s, and are respectively called solution 1 and solution 2, in chronolo- 
gical order. This ambiguity is due to the fact that the star reappearance 
occurred during a gap between consecutive exposures, corresponding 
to a net loss of information. The ASH2 data collected next to 
Harlingten did not provide enough signal-to-noise ratio to discrim- 
inate between these two solutions, and are not used in the fit described 
below (see Supplementary Information). As a dwarf planet, Eris is 
expected to be in hydrostatic equilibrium under gravity and centrifugal 
forces. The most general apparent limb shape is then an ellipse with 
semi-axes a’ >b' with effective radius Rp= Vall’, defined as the 
radius of a disk that has the same apparent surface area as the actual 
body. This shape stems either from an oblate Maclaurin spheroid 
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Figure 1 | Eris occultation light curves. The plots (black filled circles) show 
the flux of the star plus Eris, normalized to unity outside the occultation, versus 
time. No filter was used at any of the telescopes. a, The light curve from the 
ASH2 40-cm telescope at San Pedro, using a SBIG STL-11000M CCD camera, 
with 2 X 2 pixel binning and a sub-frame of 11.24 X 9.71 arcmin 

(272 X 235 pixels). The horizontal error bars indicate the total time intervals 
associated with each point (15s, while the cycle time was 18.32 s). Those bars 
are too small to be visible on the other data sets. b, The light curve from the 
Harlingten 50-cm telescope at San Pedro using an Apogee U42 CCD camera 
(2 X 2 pixel binning; sub-frame, 2.67 X 2.67 arcmin, or 100 X 100 pixels; 
integration/cycle times, 3 and 3.88 s). c, The light curve from the 60-cm 
TRAPPIST telescope at La Silla, using a FLI ProLine PL3041-BB CCD camera 
(2 X 2 pixel binning; sub-frame, 3.25 X 3.25 arcmin, or 150 X 150 pixels; 
integration/cycle times, 3 and 4.55 s). d, The light curve from the 215-cm Jorge 
Sahade telescope at CASLEO, using a Roper Scientific Versarray 1300B CCD 
camera (3 X 3 pixel binning; sub-frame, 2.62 X 3.50 arcmin, or 77 X 103 pixels; 
integration/cycle times, 4 and 7 s). The horizontal dashed lines at the bottom of 
aandc represent Eris’ contribution to the flux, showing that the star completely 
disappeared during the event (Supplementary Information, section 2). The red 
lines are the best square-well models fitted to the events. We show in b solution 
2 for the light curve (solution 1 being very close at that scale, Supplementary Fig. 3). 
The vertical arrow in d shows the time of closest approach (CA) to the shadow 
edge at CASLEO, at 8,368 s UT. 


(small angular momentum regime) with assumed equator-on viewing, 
or an elongated triaxial Jacobi ellipsoid (large angular momentum 
regime) observed pole-on, as implied by the absence of brightness 
variations. 

We have five free parameters to adjust: a’, the apparent flattening 
(a' —b’)/a’, the ellipse position angle P in the sky plane, and the two 
coordinates ofits centre, f., g. (Supplementary Table 3). With four chord 
extremities, our observations allow for an infinity of limb solutions. 
However, as the two chords have almost the same median lines 
(Fig. 2), this strongly suggests that Eris’ shape is indeed close to spherical, 
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Figure 2 | Measuring Eris’ size. The three oblique solid lines show the star 
trajectories relative to Eris, as seen from San Pedro, La Silla and CASLEO, with 
the arrow pointing towards the direction of motion. The San Pedro and La Silla 
timings provide the lengths of the two occultation segments, or ‘chords’ (in 
red); see solution 2 in Supplementary Table 3. The median lines of the two red 
segments are separated by only 5 km and coincide at that scale with the blue 
line. Celestial north is up and east is left. Scale bars: 1,000 km and 14.40 mas 
(1 mas corresponds to 69.436 km at Eris). The solid circle has a radius 

Rg = 1,163 km, and is our preferred solution for Eris’ size and shape, with the 
cross marking the position of the centre. The dot near ‘P’ indicates Dysnomia’s 
orbit pole direction” projected onto Eris’ surface. The dotted curve is an elliptic 
limb fitted to our occultation chords, with semi-major and -minor axes of 

a’ = 1,708 and b’ = 1,317 km, respectively, that is, an apparent effective radius 
of Ry = 1,500 km, the value of Eris’ radius previously derived from thermal 
measurements’. The long axis of the ellipse should be perpendicular to the 
occultation chords to within +2 ° in order to match our data points. This has a 
low probability (2%) of occurring for a random limb orientation between 0 and 
180°. Furthermore, the ellipse has an aspect ratio b/a = 0.771 that would 
require a fast rotator (with a period of 4.4h) observed pole-on to within 18° to 
suppress the rotational light curve’*"*. This has also a low probability (5%) of 
occurring for a randomly distributed pole orientation, making the dotted limb 
solution unlikely. 


unless very special geometries occurred (see below). Using a circular 
model with three free parameters (Rp = a’, f., g-) and adopting solution 
2, we obtain Rz = 1,163 + 6km (1a formal error). The minimum 7 
per degree of freedom, vee ag = 1.38, indicates a satisfactory fit to the data 
(Supplementary Table 3), Moreover, the r.m.s. radial residual of 2.1 km 
is fully consistent with our formal timing errors. We may not exclude, 
however, the possibility that random topographic features with ampli- 
tude approximately +3 km exist along the limb, which would result in 
a slightly larger error bar for Eris’ radius, Ry = 1,163 + 9km (see 
Supplementary Information). Solution 1 provides Rp = 1,140km, 
but with a high value 7 af = 30.7 (5.50 level), and radial residuals of 
+11 km and —16 kmat the beginning and end of the San Pedro chord, 
respectively. Topographic features of this size are unlikely on such a 
large icy body. This indicates that the spherical assumption is not 
correct for solution 1, and explains why we do not provide a formal 
error bar for that value. 

Allowing for a non-zero flattening of Eris’ limb, we find an infinity 
of possible solutions by fixing the position angle P and semi-major axis 
a’ at various values. If Eris’ rotation axis and Dysnomia’s orbital pole 
are aligned, we find values of R, in the range 1,105-1,155 km, smaller 
than the value 1,163 km derived above. Relaxing the constraint on Eris’ 
orientation, we find that elliptical limb models can satisfactorily fit the 
occultation chords in 68.3% of the cases (lo level) for Ry in the range 
1,165 + 90 km (Supplementary Fig. 4). However, as Rp departs from 
1,165 km, the flattening must rapidly increase, requiring fast rotations 
which are not supported by observations'*"*. The extreme case of 
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Rg = 1,500 km previously found" can be ruled out, as it requires fine 
tunings in both Eris’ limb and pole orientations (Fig. 2). Thus, the most 
straightforward interpretation of our observations is that Eris is close 
to spherical, remembering that larger sizes are possible in a narrow 
region of the parameter space. Consequently, Eris is close in size to 
Pluto, whose radius*® is estimated between 1,150 and 1,200 km. 

Our radius value implies a density of p = 2.52 + 0.05 gcm *, when 
combined with Eris’ mass’*. This is comparable to Haumea’s den- 
sity'©”” (~2.6gcm °*), for which a typical rock/ice ratio of 0.85/0.15 
is derived’*. This suggests that Haumea (and thus also Eris) is a large 
rocky body with a thin overlying ice shell. Note that the densities of 
trans-Neptunian objects (TNOs) span a large range, with p values of 
1.0, 1.6 and 2.0 gcm > for Varuna’, Charon” and Pluto’’, respec- 
tively, pointing to diverse origins and/or evolutions. Our radius value 
provides a geometric albedo of py =0.961}03 in the visible range 
(Supplementary Information). This makes Eris almost as bright as a 
perfect isotropic Lambert surface (for which py = 1 by definition), and 
one of the intrinsically brightest objects of the Solar System. For com- 
parison, Saturn’s satellite Enceladus is even brighter, with a geometric 
albedo of py ~ 1.4, associated with its geologically active surface’®. In 
contrast, Eris’ brightness and lack of light-curve variations may stem 
from the collapse of a nitrogen atmosphere (see below). We find that 
Eris is brighter than the TNO 2002 TX399, whose high albedo 
(0.881) 2) is probably due to the exposure of fresh water-ice”’. 

We now reassess Eris’ surface temperature in the light of our new 
results. Measurements by the Spitzer? and IRAM"' satellites imply 
disk-averaged brightness temperatures of Ty =30+1.2K and 
Tp = 38+7.5K at 70 and 1,200 1m, respectively. As a completely 
absorbing surface at Eris’ distance has a temperature Ty = 40 K, the 
second value is surprisingly high (and consistent with the fact that the 
previously found radius" of 1,500 km is about 30% higher than our 
value), but we note that a unique brightness temperature T,, ~ 31K 
matches both the Spitzer (at 70 j1m) and IRAM (1,200 lum) measure- 
ments at the 1.50 level (Supplementary Fig. 5). However, Eris’ surface 
temperature is probably not uniform, because an atmosphere (if any) 
would be too tenuous to isothermalize the surface frosts, as occurs for 
Triton and Pluto. We therefore consider two extreme standard tem- 
perature distribution models, corresponding to (1) a warmer slow 
rotator (or equivalently, pole-on orientation, or zero thermal inertia, 
the standard thermal model, STM) with sub-solar temperature T,,, and 
(2) a cooler fast rotator with equator-on geometry (isothermal with 
latitude model, ILM), with equatorial temperature Teas 

In the STM, both Spitzer and IRAM fluxes are reproduced satis- 
factorily with T,, ~ 35 K (Supplementary Fig. 5, Supplementary Tables 
4 and 5). The thermal equilibrium equation T,= To[(1 — pvq)/ 
(en)] * then provides a relationship between the beaming factor 7 
(describing the effects of surface roughness), the phase integral q 
and the surface emissivity ¢, where A = pyq is the Bond albedo, which 
measures the fraction of reflected solar energy. Using a standard 
value” ¢ = 0.9 and a plausible range from 7 = 1 (no roughness) to 
0.7 (large surface roughness), we obtain q = 0.49-0.66, fully consistent 
with the values for Saturn’s brightest icy satellites’. The ILM in 
contrast leads to the extreme condition 0<q< 0.24, which is an 
implausible range as bright objects also have large phase integrals”. 
Essentially, the fast rotator model does not provide enough thermal 
flux given the new, smaller size of Eris. We therefore strongly favour 
the STM, implying either a pole-on orientation or a very small thermal 
inertia, as observed in other TNOs”°”°. 

The occultation puts an upper limit on a putative atmosphere 
around Eris. As discussed in Supplementary Information, our pre- 
ferred model is an isothermal Nz atmosphere near 30K, for which 
we can place an upper limit of about 1 nbar (1q level) at the surface 
(Fig. 3). Similar limits are obtained for hypothetical CH, or Ar atmo- 
spheres. Also discussed in Supplementary Information is the possibility 
that a Pluto-like atmosphere sublimates as Eris approaches its peri- 
helion, at 37.8 Au from the Sun. In that case, Eris would currently be a 
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Figure 3 | Upper limit on Eris’ atmosphere. Each data point (filled coloured 
circles) obtained at three of the stations shown in Fig. 2 has been projected onto 
a radial scale (distance from Eris’ centre), using the circular solution 2 displayed 
in Fig. 2. The horizontal bars indicate the finite radial resolution associated with 
finite integration intervals; the vertical dotted line shows the adopted Eris 
radius, Ry = 1,163 km. The black solid line is a model light curve obtained at 
1-km resolution, using an isothermal pure N, atmosphere. Black crosses mark 
the expected flux associated with each data point, once the convolution with the 
finite integration intervals has been performed. The fit minimizes the 
differences between the black crosses and the corresponding data points (filled 
circles). The model shown here is the 3a-level upper limit of an isothermal N, 
atmospheric profile, with T = 27.7 Kanda surface pressure of 2.9 nbar. Most of 
the constraint on the model comes from the two data points obtained at La Silla 
(the two green filled circles just right of the vertical dotted line), corresponding 
to the data points obtained just before and just after the event (Fig. 1). The two 
closest San Pedro data points (red) have only a small contribution to the 77 
value, while the CASLEO data points (blue) are too far away from Eris’ edge 
(~200 km) to effectively constrain the atmospheric pressure. Using solution 1 
instead of solution 2 for Eris’ shape would have a minimal impact on the 
atmospheric upper limit, as this would slightly displace the San Pedro data 
points in the plot, leaving the La Silla points where they are shown here. 


dormant Pluto twin, with a bright icy surface created by a collapsed 
atmosphere. Detailed models are required, however, to confirm this 
model. 
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Laser cooling and real-time measurement of the 
nuclear spin environment of a solid-state qubit 


E. Togan’*, Y. Chu'*, A. Imamoglu? & M. D. Lukin! 


Control over quantum dynamics of open systems is one of the 
central challenges in quantum science and engineering. Coherent 
optical techniques, such as coherent population trapping involving 
dark resonances'”, are widely used to control quantum states of 
isolated atoms and ions. In conjunction with spontaneous emis- 
sion, they allow for laser cooling of atomic motion’, preparation 
and manipulation of atomic states*, and rapid quantum optical 
measurements that are essential for applications in metrology”’. 
Here we show that these techniques can be applied to monitor and 
control individual atom-like impurities, and their local environ- 
ment*"', in the solid state. Using all-optical manipulation of the 
electronic spin of an individual nitrogen-vacancy colour centre in 
diamond, we demonstrate optical cooling, real-time measurement 
and conditional preparation of its nuclear spin environment by 
post-selection. These methods offer potential applications ranging 
from all-optical nanomagnetometry to quantum feedback control 
of solid-state qubits, and may lead to new approaches for quantum 
information storage and processing 

Over the past two decades, coherent population trapping (CPT) has 
been used in the laser cooling of neutral atoms and ions’, the creation 


of ultracold molecules’, optical magnetometry”®, and atomic clocks’, 
as well as in slowing and stopping light pulses’. The electronic spin of 
the nitrogen—vacancy (NV) centre is a promising system for extending 
these techniques to the solid state. The NV centre has a long-lived spin 
triplet as its electronic ground state’’, whose ms = +1 and ms =0 
sublevels we denote by |+1) and |0). In pure samples, the electron 
spin dynamics are governed by interactions with the spin-1 '4N nuc- 
leus of the NV centre and spin-1/2 '°C nuclei present in 1.1% natural 
abundance in the diamond lattice (Fig. la). Control over nuclear 
spins’*"* is of interest both for fundamental studies and in applications 
such as nanoscale magnetic sensing’>”* and the realization of quantum 
networks'”"*, Here we achieve such control by two complimentary 
methods: effective cooling of nuclear spins through nuclear-state- 
selective CPT®, and conditional preparation based on fast measure- 
ments of the nuclear environment and subsequent post-selection”’. 
Whereas most previous work involved the use of microwave and 
radio-frequency fields to manipulate both the electronic and the nuc- 
lear spin states, we use all-optical control of the electronic spin'”~’. 
Specifically, we make use of A-type level configurations involving the 
NV centre’s optically excited electronic states |A,) and |A2) and the 
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Figure 1 | Coherent population trapping in NV centres. a, The A-type 
transitions between the ground states | +1) and excited states |A,) and | A) ofa 
single NV centre are addressed with a CPT laser, while a recycling laser drives 
the |0)—> |E,) transition. An external magnetic field, B, is applied using a 
solenoid. b, Photon count from NV centre NVa in a 300-Us window are plotted 
versus the applied field for three different powers of a laser addressing the state 


Optical power (uW) 


|Ad): blue, 10 LW; pink, 3 LW; yellow, 0.1 LW. The blue and pink data sets are 
shifted vertically by five and, respectively, two counts for clarity. jy, Bohr 
magneton; g, gyromagnetic ratio. c, Width of individual '*N CPT lines versus 
CPT laser power when the |A,) (blue) or |A) (pink) state is used. Solid curves 
represent the theoretical model discussed in the main text and Supplementary 
Information. Error bars, s.d. 


Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA. Institute for Quantum Electronics, ETH-Zurich, CH-8093 Zurich, Switzerland. 


*These authors contributed equally to this work. 


27 OCTOBER 2011 | VOL 478 | NATURE | 497 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


a |A;) |A,) Nuclear |A,) b 
4 | 1 100 us, 5 1 1 
ae as flip Green J | : a : = 1 
lE,) et oe | oes 2 
— —- Ey sal You L 
A J — hes 
Counter 3 LT Bag Lf ie 
re 7 Magnetic prep ae ' 
|+1) |-1) |+1) |-1) |+1) |-1) field Prepare Read-  _Verrify 
out 
|0) 
c d 


Read-out count 


-6 -4 -2 0 2 4 6 


29 pB Ro (MHZ) 


Figure 2 | Optical control and conditional preparation of the proximal ‘4N 
nuclear spin. a, Mechanism for optical pumping of '“N states. b, Pulse 

sequence for '*N optical pumping using the laser addressing the state |A,) (A; 
laser) and a fixed read-out of the prepared state using the A: laser. To ensure 
that the nitrogen-vacancy was not ionized for all subsequent data runs, we turn 
on all three lasers at the end of each run so that there is no dark state, and only 
keep data from runs in which we obtain a high number of counts during this 
verification step. c, Counts collected with NVa during the read-out step versus 
the read-out magnetic field when there is no preparation step (blue) and when 


|+1) ground states'*”* (Fig. 1a). At low temperatures (<10 K) and in 
the limit of zero strain, |A;) and |A>) are entangled states of spin and 
orbital momentum, both coupled to | +1) with o_-polarized light and to 
|—1) with o.-polarized light. Correspondingly, excitation with linearly 
polarized light drives the NV centre into a ‘dark’ superposition state 
when the two-photon detuning is zero’. In the present case, the two- 
photon detuning is determined by the Zeeman splitting between the 
|+1) states due to the combined effect of the Overhauser field originating 
in the nuclear spin environment and any externally applied magnetic 
field®'°. When the external field exactly compensates the Overhauser 
field, the electronic spin of the NV centre is pumped into the dark state 
after a few optical cycles and remains in there, resulting in vanishing 
fluorescence. This is the basis of the dark resonances and CPT. 

In our experiments, |A,) and |A2) are separated in energy by an 
amount corresponding to ~3 GHz and are addressed individually with 
a single, linearly polarized laser at near-zero magnetic field. Because 
there is a finite branching ratio from the ms = +1 manifold of the 
electronic spin into state |0), we use a recycling laser that drives the 
transition between |0) and the excited state JE» which decays with a 
small but non-vanishing probability (~10 ~“) back to the states | +1). 
In Fig. 1b, we present our observations of the CPT spectrum as a 
function of an external magnetic field at three different powers of a 
laser tuned to the |+1)—>|A,) transition. A broad resonance is 
observed at high powers, but as the power is reduced we clearly resolve 
three features in the spectrum, separated by 4.4 MHz, which is twice 
the hyperfine splitting between three '*N nuclear spin states. This 
separation corresponds to the magnetic field required to bring the 
electronic ms = +1 hyperfine states with equal nuclear spin projec- 
tions (m; = +1, 0) into two-photon resonance. 

The dependence of the CPT resonance width on the laser power 
(Fig. 1c) shows that repumping on the transition |0)<>|E,) has an 
important role in our experiment. By contrast with a conventional, 
closed three-level system, this recycling transition can be used to enhance 
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there is preparation with optical pumping using 100 nW of A, laser power for 
1.9 ms (pink). The yellow curve shows the results of '*N polarization through 
measurement-based preparation by selecting the read-out events in which the 
number of counts collected during the last 500 jis of preparation is zero 
(Supplementary Information, section 4.2). d, Steady-state population in the 
m, = 0 state after optical pumping for varying powers of the A, laser, with 
theoretical model described in Supplementary Information (solid line). Error 
bars, s.d. 


the utility of our CPT system by both decreasing the width of the CPT 
resonance and increasing the signal-to-noise ratio. The state |A,) decays 
into the ms = 0 ground state through the NV centre’s metastable singlet 
state with a substantial probability, of ~40% (Supplementary Informa- 
tion). However, the population returns to the ms = +1 state from |E,) 
after only ~ 100 optical excitation cycles. As a result, away from the two- 
photon resonance, the NV centre quickly decays to |0) after being 
excited, and then scatters many photons through the transition 
|0) <> |E,) before returning to the A system (the A-type level configura- 
tion described above). If the NV centre is not in a dark state, this process 
effectively increases the number of photons we collect by a factor of 
2/n = Yci/Yce. Where ce is the cross-transition rate from |E,) into |+1) 
and y,, is the transition rate from |A)) to the singlet. The cycling effect 
also reduces the width of the CPT line because the |0) <> |E,) transition 
quickly saturates away from two-photon resonance, provided that 
the CPT laser excitation rate exceeds the rate of leakage out of the 
recycling transition. Significantly, both of these effects lead to 
improved sensitivity of dark resonances to small changes in two- 
photon detuning. 

To demonstrate this increase in sensitivity, in Fig. lc we compare the 
widths of dark resonances observed through excitation of |A;) and 
|A2). Through independent measurements of the branching ratios 
(Supplementary Information), we determined that |A,) and |A,) corre- 
spond to an open and, respectively, nearly closed A systems, with 
Na, ~3-1 x 107? and n 4, ~2.6. We compare these experimental results 
with a theoretical model described in Supplementary Information, 
which predicts that the resonance linewidth, do, is given by 


oo= Ri 
© VW 140/1)(Ra/Re + 2Ra/y) 


= /RaReny/(Re+y) 
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for small 7, where R4 and Rg correspond to the optical excitation rates 
of a laser tuned to states |A;) or |A2) and, respectively, |E,), and y is the 
decay rate of excited states. The width at low powers is determined by 
the random magnetic field associated with surrounding ‘°C nuclear 
states. When this line broadening mechanism is taken into account 
(Supplementary Information), the experimental results are in excellent 
agreement with these predictions, which we plot as solid lines in 
Fig. lc. 

Having resolved the hyperfine coupling between the nitrogen- 
vacancy electron spin and ““N spin, we now demonstrate optical cool- 
ing of the nuclear spin states using dark resonances. This method 
(Fig. 2a) is reminiscent of laser cooling of atomic motion by velo- 
city-selective CPT**. A redistribution of the '*N spin state population 
on optical excitation takes place because the hyperfine coupling in the 
excited electronic state of the NV centre is enhanced by a factor of ~20 
relative to that in the ground state”*. If the external field is set such that, 
for example, the m; = 0 hyperfine states are in two-photon resonance, 
only the states with nuclear configuration m; = +1 willbe promoted to 
the excited states, where flip-flops with the electron spin will change 
the nuclear spin state to m; = 0. When the NV centre spontaneously 
decays into the dark superposition of electronic spin states, optical 
excitation will cease, resulting in effective polarization (cooling) of 
nuclear spin into the m; = 0 state. 

In Fig. 2c, we present our observations of laser cooling of '*N nuc- 
lear spins by CPT. For each point, the pulse sequence shown in Fig. 2b 
is applied, where the magnetic field Bp,ep is kept at zero during the 
preparation/optical pumping process, and fluorescence is collected 
when the field is switched to a particular read-out value, Bao. The 
increased contrast of the m;=0 CPT line relative to the other two 
corresponds to a nuclear spin polarization of 61.5 + 4.4%. As shown in 
Fig. 2d, by optimizing the power of the laser tuned to |A;) we achieve a 
maximum nuclear polarization of 76.4 + 4.4% over a timescale of 
353 + 341s. The degree of polarization is probably limited by the 
escape rate out of the dark state due to off-resonance excitation of 
|A2) and dephasing caused by surrounding '*C nuclei. A simple theor- 
etical model taking into account these two processes and using inde- 
pendently measured parameters (Fig. 2d, solid line) reproduces the 
qualitative features of our experimental results. 

We can further improve the preparation of '*N nuclear spins in a 
desired state by measurement and post-selection, as predicted by 
theoretical proposals’®''*. Specifically, the observation of zero photo- 
detection events during the preparation step at B,,.) = 0 determines 
the '“N spins to be in the m; = 0 state. For instances in which such a 
measurement result was obtained, the bottom plot of Fig. 2c shows the 
nuclear spin populations measured during the subsequent probe step. 
The resulting '“N polarization, of more than 92 + 6%, demonstrates 
that high-fidelity conditional preparation of nuclear spins is possible. 

We next extend our technique to monitor and cool the many-body 
environment of the NV centre, which consists of °C nuclei distributed 
throughout the diamond lattice. The large number of nuclear spin 
configurations associated with an unpolarized environment results 
in a random Overhauser field, Bo,, with unresolved hyperfine lines. 
This produces a finite CPT linewidth in measurements that average 
over all configurations of the '°C spin bath (Fig. 1b, c). We can over- 
come this limitation by making fast measurements. The key idea of our 
approach is to use the long correlation time, T}"°, associated with 
evolution of the nuclear bath, to observe its instantaneous state and 
its dynamics. Such a fast measurement is illustrated in Fig. 3a, where 
the externally applied field is ramped across a single '*N m; = 0 line 
while the CPT lasers are on. The yellow points in Fig. 3c shows the line 
shape averaged over many experimental runs, and the intensity plot in 
Fig. 3b shows counts collected in 80-p1s time bins during successive 
individual runs, many of which distinctly show a narrow dark region. 
Lorentzian fits to selected experimental scans (Fig. 3c, blue and pink 
curves) reveal ‘instantaneous’ CPT resonances with linewidths that are 
more than a factor three less than those of the averaged measurement. 
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Figure 3 | Observation of instantaneous Overhauser field from the °C spin 
bath. a, Pulse sequence for real-time measurement of the 15C nuclear 
configuration. The applied magnetic field is ramped over a single '*N CPT line 
over 5 ms while counts are collected in 80-ps bins. b, Counts from 200 
successive runs are shown on horizontal lines for NVb. Runs in which the 
verification step fails are blacked out. The centres of constrained Lorentzian fits 
(Supplementary Information, section 7) to individual runs are indicated with 
green dots. c, Two such individual runs are shown with their fits (pink and 
blue), along with an average of scans that passed verification (yellow). 

d, Autocorrelation, R(t), of counts with magnetic field fixed at the m; = 1 My 
line. The fit is to a bi-exponential decay. 


The change in the centres of the dark regions (Fig. 3b, green dots) 
indicates that the instantaneous field evolves in time. 

To provide more quantitative insight into the dynamics of the nuc- 
lear environment, we record the fluorescence counts at a fixed value of 
the external magnetic field with a time resolution of 801s during 
50-ms time intervals. The autocorrelation of the resulting photon 
detection events (Fig. 3d) reveals two distinct timescales: 
T, = 350 + 30s, consistent with the '*N nuclear spin polarization 
timescale, and t, = 8.40 +0.20ms. Most notably, because we can 
detect dark states of the NV centre with a resolution of 80 1s, these 
results indicate that reliable measurement of the Overhauser field is 
possible within its correlation time. 

We now demonstrate how fast measurements can be used to pre- 
pare conditionally the '*C environment of the NV centre in a desired 
state with post-selection. We record counts accumulated during both 
the preparation step and the read-out step with relatively low power 
using the sequence shown in Fig. 4a. Similar to measurement- 
based preparation of the '*N spin, by conditionally selecting zero- 
photon-detection events during the preparation step, we can select 
the states of the °C environment that have vanishing two-photon 
detuning (6 = 2gp(Bporep + Boy) = 0). The pink curve in Fig. 4b shows 
(unconditioned) read-out counts recorded following the preparation 
step, whereas the blue curve shows the results of (conditioned) 
measurement-based preparation. The measured width of such a con- 
ditionally prepared distribution is significantly smaller than the width 
corresponding to individual '*N resonances obtained without pre- 
paration. We find that although this width depends on B,,ep, so too 
does the position of the narrow feature, indicating that we can con- 
ditionally prepare the '°C environment by post-selection with a 
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Figure 4 | Measurement-based preparation of ‘°C spin bath. a, Pulse 
sequence for preparation and subsequent measurement of the °C 
configuration. Bprep is set within the central '4N line and Bao is varied to cover 
all associated '°C states. A green laser pulse anda '*N optical pumping step with 
the A, and E, lasers (not shown) occur before this. Counts during a 
conditioning window of length Tona (Mcona) at the end of preparation and 
during the 500-p1s read-out window (ngo) are recorded for each run. The data 
presented are an average of many such experimental runs. b, nao for NVb 
versus Bao is shown in pink with double-Lorentzian fit. The same data set 
analysed by keeping only events with noong = 0 is shown in blue. The 


Overhauser field of our choice (Supplementary Fig. 8). The prepared 
configurations are long lived both when unilluminated (>>6 ms; 
Supplementary Fig. 9) and in the presence of laser light, consistent 
with autocorrelation measurements (12 = 8.4 ms; Fig. 3d). 

We now discuss our experimental results and explore the limits of 
our ability to probe and prepare the ‘°C environment. As discussed in 
Methods Summary and illustrated in Fig. 4c, conditional measurement 
prepares an Overhauser field distribution that consists of the broad 
unconditioned distribution suppressed by exp(— CT ona) and a narrow 
peak with a width 6,=,/In(2)d9 jf VCTeonds Where C is the fluor- 
escence rate of the bright state and Tyona is the measurement time. 
The read-out step itself has a ‘resolution’ determined by the dark- 
resonance linewidth, 69. The observed features represent a convolution 
of the dark-resonance probe with the conditionally prepared distri- 
bution. For the conditional preparation to be effective, we require that 
CT ona > 1 and, therefore, 6, < 69, indicating that the measured CPT 
linewidth will be limited by the read-out step. Experimentally, we find 
that our measured line shapes can be well fitted by a combination of 
two Lorentzian distributions, one narrow and one broad, whose widths 
and positions are only weakly dependent on photodetection time, 
Tcond- However, as T-ona is increased, the relative weight of the narrow 
distribution increases (Fig. 4d). This is consistent with the theoretical 
prediction that the read-out-limited width of the narrow resonance 
does not depend on T.,,4g, and better discrimination in conditional 
measurements increases the probability that the nuclear spin state is 
prepared in the narrow distribution. 

Notably, we find that even without conditioning (Fig. 4b, pink line), a 
narrow distribution of nuclear spin configurations around Bpyep is pre- 
pared. This modification of the nuclear distribution is a result of CPT- 
based laser cooling of the ‘°C bath, consistent with the predictions of 
ref. 8. The specific physical mechanism of such cooling probably 
involves electronic-spin-dependent evolution of the '*C nuclei, and 
will be discussed in detail in future studies. We emphasize that this 
observation provides a clear indication that the magnetic environment 
is affected by the dynamics of the NV centre, providing direct evidence 
that the nitrogen-vacancy spin dynamics is dominated by the 
Overhauser field rather than external magnetic field fluctuations. 
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unprepared '°C distribution (the same as yellow curve in Fig. 3c) is shown in 
black for qualitative comparison (shifted by 4.3 counts for clarity). c, Physics of 
conditional preparation through measurement. d, Amplitudes of broad (blue) 
and narrow (pink) distributions versus Ton for NVb. The same data set was 
used for each point and the length of the conditioning window was changed in 
post-processing. e, Full-width at half-maximum (FWHM) of measured '°C 
distribution with (blue) and without (pink) conditional preparation, versus A, 
laser power for NVa. Solid lines are theoretical predictions for the read-out 
linewidth, 6p (blue), and the CPT linewidth for the unprepared 13C distribution 
(pink; same as in Fig. 1c). Error bars, s.d. 


Figure 4e shows how the observed linewidth of the narrow feature 
depends on the CPT laser power. At low powers, the observed width 
reaches a minimum value of 104 + 49 kHz. This limiting width results 
from the effects of strain splitting of states |+ 1) on the read-out process 
at zero magnetic field*® (see Supplementary Information for a quant- 
itative discussion of effects of strain). Owing to this splitting, very small 
magnetic field changes do not shift the energies of states | 1) to first 
order. Therefore, our CPT read-out signal becomes insensitive to 
Zeeman shifts of the order of twice the strain splitting (Supplemen- 
tary Information). In addition, a minimal linewidth of ~400 kHz was 
obtained for measurements performed with a separate NV centre 
(NVb) subject to higher strain. 

The limit associated with strain splitting can be easily circumvented 
by using a large external magnetic field to split the spin states | +1) and 
two laser frequencies to address the NV centre in a Raman configuration 
near zero two-photon detuning. As described in Supplementary 
Information, a modest increase in collection efficiency by, say, one order 
of magnitude”® would allow us to obtain quantum-limited narrowing of 
the nuclear distribution to 6. ~ 1/T ong which, in turn, could be on the 
order of the inverse lifetime of the given nuclear configuration’. 

The experiments reported here offer the intriguing prospect of using 
coherent optical techniques to control nuclear spins surrounding 
quantum emitters. For instance, the technique that we describe can 
be used to study the quantum many-body dynamics of ‘central-spin’ 
models in real time, either in isolation or in the presence of dissipa- 
tion”’. Specific examples of this include nuclear field diffusion that, in 
the presence of CPT lasers, is expected to have statistical properties 
reminiscent of Lévy flights in velocity-selective CPT”*. Furthermore, 
our approach allows for direct application of quantum feedback con- 
trol to drive nuclear spins deterministically into a desired state. This 
may be used to prepare non-classical superposition states of nuclear 
spins analogous to spin-squeezed states in atomic ensembles”, and to 
‘engineer’ collective dissipation in nuclear spin ensembles useful for 
applications in quantum information science, such as the long- 
term storage of quantum states’’. Finally, our method allows for an 
all-optical approach to magnetic sensing*® that may have interesting 
applications in nanoscience’*”®. 
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METHODS SUMMARY 


Sample description. The diamond sample used was a natural, high-purity, type- 
Ila diamond with a (111) cut, kept at ~7 K. The data presented in the main text and 
the Supplementary Information came from measurements on three separate NV 
centres in this sample. The first NV centre (NVa) was under relatively low strain 
and had a narrow distribution of !°C states. The second, higher-strain NV centre 
(NVb) had a broader distribution of '*C states. All experiments on optical cooling 
and conditional preparation were repeated using both of these NV centres, with 
consistent results. Figures 1, 2 and 4e and Supplementary Fig. 3 present measure- 
ments on NVa. The remaining figures in the main text and Supplementary 
Information, excluding Supplementary Fig. 5, present measurements on NVb. 
The third NV centre (NVc) was used for electron spin resonance measurements 
(Supplementary Information, section 5), with which we calibrated the ground- 
state strain for NVa and NVb by assuming that it is proportional to the strain 
measured using the excited states |E,) and |E,) (ref. 25). 

Measurement-based preparation of '*C environment. We consider the situation 
in which the NV centre is continuously monitored for a time T..n4. The average 
number of photons detected during preparation is given by 7(6)Tcona, where the 
photon detection rate, (6) =C ia / (05 + 6°), is related to the instantaneous value of 
the Overhauser field through the two-photon detuning, 6 (Fig. 4c). Directly after 
preparation, the nuclear state probability distribution determined by conditioning 
on obtaining zero counts is P(d | n=0), which is related to the conditional 
probability of a zero-count event, P(n=0 | 0), by P(d | n=0)=P(n=0 | 
6)P(6)/P(n = 0), where P(m) and P(6) are respectively the unconditional count 
and detuning distributions. For a Poisson-distributed random process of photon 
counts, we find that P(5|n=0)* exp(—CTeonad” / (6 +5°))P(5). AS Teona 
increases, the range of 6 for which we obtain zero counts owing to the existence 
of a dark state becomes small, and, for large 6, we expect the average number of 
counts to be high and the probability of detecting n = 0 counts due to shot noise to 
be small. This effectively reduces the width of the conditionally prepared nuclear 
spin distribution. 
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Oxygen-containing mononuclear iron species—iron(tl)-peroxo, 
iron(111)-hydroperoxo and iron(Iv)-oxo—are key intermediates in 
the catalytic activation of dioxygen by iron-containing metallo- 
enzymes’ ’. It has been difficult to generate synthetic analogues of 
these three active iron-oxygen species in identical host complexes, 
which is necessary to elucidate changes to the structure of the iron 
centre during catalysis and the factors that control their chemical 
reactivities with substrates. Here we report the high-resolution crys- 
tal structure of a mononuclear non-haem side-on iron(t1)-peroxo 
complex, [Fe(m)(TMC)(OO)] * We also report a series of chemical 
reactions in which this iron(111)—peroxo complex is cleanly converted 
to the iron(m)-hydroperoxo complex, [Fe(m1)(TMC)(OOH))]’*, 
via a short-lived intermediate on protonation. This iron(t)- 
hydroperoxo complex then cleanly converts to the ferryl complex, 
[Fe(tv)(TMC)(O)]?*, via homolytic O-O bond cleavage of the 
iron(11)-hydroperoxo species. All three of these iron species—the 
three most biologically relevant iron-oxygen intermediates—have 
been spectroscopically characterized; we note that they have been 
obtained using a simple macrocyclic ligand. We have performed 
relative reactivity studies on these three iron species which reveal 
that the iron(11)-hydroperoxo complex is the most reactive of the 
three in the deformylation of aldehydes and that it has a similar 
reactivity to the iron(Iv)-oxo complex in C-H bond activation of 
alkylaromatics. These reactivity results demonstrate that iron(1)- 
hydroperoxo species are viable oxidants in both nucleophilic and 
electrophilic reactions by iron-containing enzymes. 

TMC (1,4,8,11-tetramethyl-1,4,8,11-tetraazacyclotetradecane) is a 
macrocyclic ligand of remarkable versatility in the field of biomimetic 
chemistry of dioxygen activation by metal complexes. A variety of 
metal complexes of superoxo, peroxo and oxo ligands showing a wide 
range of properties have been recently synthesized and characterized 
using the TMC ligand*"". In the case of iron, TMC complexes of oxo 
and peroxo ligands are known'"”’; however, neither the structure of 
the latter nor that of any other iron(11)—peroxo complexes have been 
reported before this study. We report here the X-ray crystal structure 
of a high-spin iron(I)—peroxo complex bearing the TMC ligand, 
[Fe(111)(TMC)(OO)]* (1; Fig. 1). 

The iron(i)—peroxo complex, prepared by reacting [Fe(m)(TMC) 
(CF3SO3)2] with 5 equiv. HO, in the presence of 2 equiv. triethyla- 
mine in CF3;CH,OH at 0°C (ref. 12), was characterized with 
ultraviolet-visible absorption spectroscopy (Fig. 2a), electrospray 
ionization mass spectrometry (Supplementary Fig. 1), and electron 
paramagnetic resonance (EPR) spectroscopy (Supplementary Fig. 2), 
as reported previously”, as well as resonance Raman spectroscopy 
(Supplementary Fig. 3) and X-ray absorption spectroscopy/extended 
X-ray absorption fine structure (XAS/EXAFS) (Fig. 2b, c). The X-ray 
crystal structure of 1-(ClO,4) revealed the mononuclear side-on 1:1 
iron complex with O,; the Fe is in a distorted octahedral geometry, 
which arises from the triangular FEOO moiety with a small O-Fe-O 
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(‘bite’) angle of 45.03(17)° (Fig. 1; Supplementary Tables 1 and 2). The 
FeOO geometry is similar to the crystallographically characterized 
structure of naphthalene dioxygenase, where dioxygen binds side-on 
close to the mononuclear iron at the active site (1.75 A resolution, O-O 
distance ~1.45 A)°. The structurally determined O-O bond length of 
1.463(6) A in our complex is indicative of peroxo character of the OO 
group’’, as supported by resonance Raman data’*’> (Supplementary 
Figs 3 and 4). It is worth noting that both the O-O bond length and the 
average Fe-O bond length (1.910 A) of 1 are longer than those of other 
metal(1I11)—peroxo complexes bearing a series of TMC ligands*"°. The 
structure of 1 was further supported by EXAFS analysis (Fig. 2c), 
which identifies a 2:4 split first shell, with two short Fe-O paths at 
1.92 A and four longer Fe-N TMC paths at ~2.22 A (Supplementary 
Table 3, fit 1-3; Supplementary Table 4). 

From the structure of 1 (Fig. 1), all four N-methyl groups point to the 
same side of the peroxo moiety, as observed in other metal(11)—peroxo 
complexes*'®. In the case of an iron(Iv)-oxo complex bearing the TMC 
ligand, the N-methyl groups of a Sc*-bound [Fe(1v)(TMC)(O)]** 
complex are also syn to the oxo ligand'’, whereas those in 
[Fe(tv)(TMC)(O)(CH3CN)]* are anti to the oxo ligand”. In addition, 
no axial ligand binds to the iron ion trans to the peroxo ligand in 1, 
which is different from the [Fe(1v)(TMC)(O)(CH;CN)]?* complex! 
but similar to other metal(11)-peroxo complexes*"” as well as the Sc’ * - 
bound [Fe(1v)(TMC)(O)]** complex!*. 

Addition of a slight excess of HClO, (for example, 3 equiv. to 1) to 
a solution of 1 in acetone/CF;CH2OH (3:1) at -40°C immediately 
generated a violet intermediate (2) with an electronic absorption band 


Figure 1 | X-ray crystal structure of 1. Structure of [Fe(TMC)(OO)]* 

(1), with thermal ellipsoids drawn at the 30% probability level, produced using 
ORTEP software. Hydrogen atoms are omitted for clarity. Selected bond 
lengths (A): Fe-O1 1.906(4), Fe-O2 1.914(4), Fe-N1 2.192(4), Fe-N2 2.256(5), 
Fe-N3 2.180(5), Fe-N4 2.273(4), O1-O2 1.463(6). Selected angles (°): O1—Fe- 
O2 45.03(17), Fe-O1-O2 67.8(2), Fe-O2-O1 67.2(2). 
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Figure 2 | Ultraviolet—visible spectra and XAS data of 1, 2 and 3. In a—c, data 
for 1, 2 and 3 are shown respectively in blue, red and green. a, Ultraviolet- 
visible spectra of 1, 2 and 3; arrows indicate spectral changes for the conversion 
of 2 to 3 in the reaction of 1 (1 mM) and 3 equiv. HCIO, in acetone/CF;CH,0H 
(3:1) at -40 °C. b, Main panel, Fe K-edge XAS data; inset, expanded pre-edge 
region. Dotted black line shows starting material, high-spin [Fe(m)(TMC)]}**, 
for reference. c, Main panel, Fourier transform of EXAFS data (k = 2-16). R, 
bond length; A, phase shift of the scattered wave. Inset, EXAFS data (solid lines) 
with final fits (dashed lines); y axis shows EXAFS intensity multiplied by k°. 
These data show striking differences across the series, most of which are the 
result of changes to the first coordination sphere. 


at Amax = 526 nm, followed by a conversion (t,/2 ~ 60 min) from 2 to 
the corresponding iron(Iv)-oxo complex, [Fe(1v)(TMC)(O)]?* (3), 
with an isosbestic point at 735 nm (Fig. 2a; also see Supplementary 
Fig. 5 for the electrospray ionization mass spectrometry of 3). (Here 
Amax is the wavelength of maximum absorption, and t,, is the half- 
life.) Intermediate 2 rapidly reverts back to 1 on addition of 3 equiv. 
tetramethylammonium hydroxide, suggesting that 1 and 2 are inter- 
converted through the previously reported acid-base chemistry 
between iron()—-peroxo and iron(1m)-hydroperoxo species'*’”"*. 
Because the interconversions were fast, the reactions were followed 
using a stopped-flow spectrometer. On addition of 3 equiv. HClO, to 1 
in acetone/CF;CH2OH (3:1) at -40°C, the absorption band of 1 at 
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Amax = 782 nm disappeared immediately (<2 ms), but the absorption 
band of 2 at Amax = 526 nm appeared gradually. The full formation of 2 
took ~100ms with an observed rate (kbs) of 93s at -40°C 
(Supplementary Fig. 6). These spectral changes indicate that another 
intermediate (2’) exists in the conversion of 1 to 2, with an extremely 
fast conversion of 1 to 2’ followed by the relatively slow (that is, 
~100 ms) conversion of 2’ to 2 (Fig. 3 and Supplementary Fig. 7). 
2’ is likely to be a side-on iron(im)-hydroperoxo species (Fig. 3 and 
Supplementary Fig. 7), although we have been unable to characterize it 
spectroscopically owing to its extremely short lifetime. The reverse 
reaction, which is the conversion of 2 to 1, was also investigated using 
stopped-flow methods. Addition of 3 equiv. of tetramethylammonium 
hydroxide to a solution of 2 at -80 °C resulted in the formation of 1 
with a clear isosbestic point at 635nm and a kop, of 19s! (Sup- 
plementary Fig. 8; see Fig. 3 for reaction pathways). 

Intermediate 2 was characterized using a variety of spectroscopic 
techniques, including EPR, XAS/EXAFS and resonance Raman. The 
EPR spectrum ofa frozen acetone/CF;CH,OH (3:1) solution measured 
at 10 K exhibits signals at g = 6.8, 5.2 and 1.96 (Supplementary Fig. 9), 
which are indicative of a high-spin (S = 5/2) Fe(1m) species'*’*. The 
EXAFS data for 2, compared to 1 (Fig. 2c), also exhibit a distinct shift 
in the coordination environment, from a 2:4 O:N split first shell in 1 to 
only a single Fe-O path at a distance of 1.85 A in 2 (Supplementary 
Table 3, fit 2-2). This conversion is evident, as a 2:4 split first shell for 2 
produces unreasonable bond variances (Supplementary Table 3, fit 
2-1). In addition, the remaining Fe-N paths of the TMC ligand have 
contracted to an average distance of 2.17 A, reflecting a decrease in 
ligation. The Fe K-edge of 2 energetically overlays well with that of 1 
(Fig. 2b), consistent with the assignment of 2 as a high-spin Fe(1) 
system. The 1s—3d Fe K pre-edge feature of 2 exhibits an increase 
in pre-edge intensity from 17.5 to 25.6 units, 1 to 2, respectively 
(Supplementary Table 5). A pre-edge intensity of 25.6 units is substan- 
tially larger then those of other six-coordinate or even five-coordinate 
complexes”, thus favouring a five-coordinate structure for 2 
(Supplementary Fig. 10). 

On 531-nm excitation at 77 K, the resonance Raman spectrum of 
*°Q-labelled 2 in dg-acetone shows two isotopically sensitive bands at 
658 and 868cm ' (Supplementary Fig. 11). The peak at 658cm 
shifts to 633cm ! on '8O-substitution, and is the Fe—O stretch. 
The peak at 868 cm ? shifts to 820cm ‘ on '8O-substitution, and is 
the O—O stretch. The O—O stretch of 2 is higher in wavenumber than 
those of other high-spin Fe(1m)-OOH(R) complexes (for example, 
830m! for [Fe(Hbppa)(OOH)]?*)”’ and is much higher than those 
of low-spin Fe(m11)-OOH(R) complexes (for example, 790 cm! for 
[Fe(N4Py)(OOH)]**)!, consistent with the conclusion that 2 is a 
high-spin Fe(11)-OOH complex. In addition, the Fe—O stretch of 2 is 
higher than those of six-coordinate high-spin Fe(1m)-OOH(R) com- 
plexes (for example, 621 cm™! for [Fe(H,bppa)(OOH)]**)”, indicating 
a stronger Fe—O bond that would be consistent with the absence of a 
trans-axial ligand in 2. This observation is also consistent with the above 
XAS results suggesting a five-coordinate model for 2. These spectro- 
scopic results are further supported by density functional theory calcula- 
tions that indicate that a high-spin [Fe(1)(TMC)(OOH)|** complex 
with its methyl groups oriented syn to the OOH ligand does not bind a 
trans-axial ligand (Supplementary Fig. 12). 

Knowing that an iron(Iv)-oxo complex, [Fe(Iv) (TMC)(O)]** (3), is 
formed as the decay product of 2 (Fig. 2; Supplementary Figs 5 and 13), 
two possible mechanisms of hydroperoxide O-O bond cleavage of 2 
are considered: one is the heterolytic O-O bond cleavage of the hydro- 
peroxide ligand of 2, which would generate an Fe(v)-oxo species, 
followed by one-electron reduction of the Fe(v)-oxo species that 
results in the generation of 3 (pathways A and B in Supplementary 
Fig. 14). The other possibility is the homolytic hydroperoxide O-O 
bond cleavage of 2, affording 3 and a hydroxyl radical (pathway C in 
Supplementary Fig. 14). Recently, the former mechanism has been 
proposed”, based on the observation that the formation rate of 3 from 
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Figure 3 | Iron-oxygen intermediates. Generation, structural and spectroscopic characterization, and reactivities of mononuclear non-haem iron—-oxygen 


intermediates detected in the reactivity studies of 1. 


2 was accelerated by increasing the proton concentration in CH3CN 
solution. However, under different solvent conditions, we found no 
proton concentration effect on the rate of the hydroperoxo O-O bond 
cleavage of 2. In fact, the formation rates were essentially the same 
irrespective of the proton concentration in acetone/CF;CH,OH (3:1) 
and other solvents except CH3CN (Supplementary Table 6). 

Additional evidence that argues against the formation of an iron(v)- 
oxo species via O-O bond heterolysis was obtained by carrying out 
reactions in the presence and absence of substrates (Supplementary 
Fig. 15). If3 were in fact the product of the one-electron reduction ofan 
iron(v)-oxo species, then the amount of 3 formed in the presence of 
the substrates should decrease due to fast reactions between the highly 
reactive iron(V)-oxo species and the substrates (pathways A and D in 
Supplementary Fig. 14). However, the amounts of 3 formed in the 
presence and absence of substrates were the same, implying that an 
iron(Vv)—oxo species was not generated via an O-O bond heterolysis 
mechanism in the course of the formation of 3 from 2. It is also 
significant that this chemistry, in which a high-spin iron(u)- 
hydroperoxo species undergoes O-O bond homolysis, has not been 
reported previously for other high-spin iron(1m)-hydroperoxo species™”’. 
A possible reason is that the Fe(i1v)=O product in our reaction is an 
intermediate-spin, six-coordinate complex. Thus, changes along the 
reaction coordinate in spin state and coordination number could con- 
tribute to the energetics of this O-O bond cleavage. 

The detailed reactivities of the three intermediates, iron(m1)—peroxo 
(1), iron(im)-hydroperoxo (2), and iron(Iv)-oxo (3), have been investi- 
gated in both nucleophilic and electrophilic reactions (Fig. 3). The 
nucleophilic characters of all three intermediates were tested in aldehyde 
deformylation reactions. On addition of 2-phenylpropionaldehyde 
(2-PPA) to 1 and 3 in acetone/CF;CH,OH (3:1) at -40 °C, the inter- 
mediates remained intact without any spectral change (Supplementary 
Fig. 16). These results indicate that 1 and 3 are relatively unreactive in 
nucleophilic oxidative reactions at -40 °C, although 1 showed a react- 
ivity with the aldehyde at high temperature (for example, 15 °C)’*. In 
contrast, intermediate 2 reacted rapidly with 2-PPA at -40 °C, resulting 
in the disappearance of its characteristic ultraviolet-visible band. This 
follows a first-order decay profile and forms 3 with an isosbestic point at 
735 nm (Fig. 4a; see Supplementary Fig. 17 for aldehyde concentration 
effect). The high reactivity of 2 in nucleophilic reactions, compared to 
the side-on iron(11)—peroxo analogue 1, is ascribed to the end-on bind- 
ing mode of the hydroperoxo ligand**” and supported by density func- 
tional theory calculations (Supplementary Fig. 18). The reactivity of 
2 was further investigated using primary (1°-CHO), secondary 
(2°-CHO), and tertiary (3°-CHO) aldehydes (Fig. 4b), and the observed 
reactivity order of 1°-CHO > 2°-CHO > 3°-CHO supports the nucleo- 
philic character of 2 (Supplementary Fig. 19 shows additional evidence 
for the nucleophilic character of 2). 

The electrophilic characters of 1, 2 and 3 were also investigated in 
the oxidation of alkylaromatic compounds with weak C-H bonds, 
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such as xanthene (75.5kcalmol ') and 9,10-dihydroanthracene 
(DHA, 77 kcal mol’). 1 did not show any significant spectral change 
on addition of substrates in acetone/CF;CH,OH (3:1) at -20°C 
(Supplementary Fig. 20a). In contrast, 2 and 3 reacted with DHA 
under the same conditions (Supplementary Figs 20b and 21a), show- 
ing that both 2 and 3 are capable of abstracting a hydrogen atom from 
DHA and that 2 has a similar reactivity to 3 in this C-H bond activa- 
tion reaction. Second-order rate constants of 8.110 ' and 
2.410 7M_'s | were determined in the oxidation of xanthene 
and DHA, respectively, by 2 at -20°C (Supplementary Fig. 21b). On 
the basis of the above observations—that the reaction rates are dependent 
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Figure 4 | Reactivity studies of 2 with aldehydes. a, Ultraviolet—visible 
spectral changes showing the decay of 2 (1 mM) and formation of 3 on addition 
of 2-PPA (50 mM). Inset, time courses of the absorbance change of 2 at 526 nm 
for the reaction with 2-PPA (filled circles) and for the natural decay (open 
circles). b, Second-order rate constants determined in the reactions of 2 with 
pentanal (1°-CHO; triangles), 2-methylbutanal (2°-CHO; circles), and 
pivalaldehyde (3°-CHO; squares). Standard deviation is <10% of the data used 
in the plot. 
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on the substrate concentration and the bond dissociation energies of 
the substrates (Supplementary Fig. 21b)*°’”—-we conclude that 2 is the 
active oxidant in abstracting an H atom from the substrates. 

In conclusion, iron(111)—hydroperoxo intermediates have been pro- 
posed as active oxidants in cytochrome P450-catalysed deformylation 
of aldehydes**”° and activated bleomycin-mediated DNA cleavage via 
H-atom abstraction*'”°. We have provided here direct experimental 
evidence that a high-spin iron(1m)-hydroperoxo species is capable 
of acting as an active oxidant in both nucleophilic and electrophilic 
reactions. 


METHODS SUMMARY 


The iron(m1)—peroxo complex, [Fe(111)(TMC)(OO)] * (1), was prepared by react- 
ing [Fe(II)(TMC)(CF3SO3)2] with 5 equiv. HO, in the presence of 2 equiv. 
triethylamine in CF;CH,OH at 0°C. Unlike the highly unstable nature of 1 in 
CH3CN (refs 12, 22), 1 prepared in CF;CH2OH persisted for several hours at 0 °C, 
and this greater thermal stability of 1 in alcoholic solvents allowed for the isolation 
of crystals with ~80% yield, which were used for spectroscopic characterization 
and reactivity studies. Crystals suitable for structural analysis were obtained from 
CH,OH/diethyl ether with excess NaClO, at -40 °C. See experimental section in 
Supplementary Information for detailed experimental conditions and procedures, 
spectroscopic and kinetics analyses, and computational calculations. 
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Technological advances in DNA recovery and sequencing have 
drastically expanded the scope of genetic analyses of ancient specimens 
to the extent that full genomic investigations are now feasible and 
are quickly becoming standard’. This trend has important implica- 
tions for infectious disease research because genomic data from 
ancient microbes may help to elucidate mechanisms of pathogen 
evolution and adaptation for emerging and re-emerging infections. 
Here we report a reconstructed ancient genome of Yersinia pestis at 
30-fold average coverage from Black Death victims securely dated to 
episodes of pestilence-associated mortality in London, England, 
1348-1350. Genetic architecture and phylogenetic analysis indicate 
that the ancient organism is ancestral to most extant strains and sits 
very close to the ancestral node of all Y. pestis commonly associated 
with human infection. Temporal estimates suggest that the Black 
Death of 1347-1351 was the main historical event responsible for 
the introduction and widespread dissemination of the ancestor to 
all currently circulating Y. pestis strains pathogenic to humans, and 
further indicates that contemporary Y. pestis epidemics have their 
origins in the medieval era. Comparisons against modern genomes 
reveal no unique derived positions in the medieval organism, 
indicating that the perceived increased virulence of the disease 
during the Black Death may not have been due to bacterial pheno- 
type. These findings support the notion that factors other than 
microbial genetics, such as environment, vector dynamics and host 
susceptibility, should be at the forefront of epidemiological discus- 
sions regarding emerging Y. pestis infections. 

The Black Death of 1347-1351, caused by the bacterium Yersinia 
pestis*’, provides one of the best historical examples of an emerging 
infection with rapid dissemination and high mortality, claiming an 
estimated 30-50% of the European population in only a five-year 
period*. Discrepancies in epidemiological trends between the medieval 
disease and modern Y. pestis infections have ignited controversy over 
the pandemic’s aetiologic agent*®. Although ancient DNA investi- 
gations have strongly implicated Y. pestis” in the ancient pandemic, 
genetic changes in the bacterium may be partially responsible for 
differences in disease manifestation and severity. To understand the 
organism’s evolution it is necessary to characterize the genetic changes 
involved in its transformation from a sylvatic pathogen to one capable 
of pandemic human infection on the scale of the Black Death, and to 
determine its relationship with currently circulating strains. Here we 
begin this discussion by presenting the first draft genome sequence of 
the ancient pathogen. 

Y. pestis is a recently evolved descendent of the soil-dwelling bacillus 
Yersinia pseudotuberculosis’, which in the course of its evolution 


acquired two additional plasmids (pMT1 and pPCP1) that provide it 
with specialized mechanisms for infiltrating mammalian hosts. To 
investigate potential evolutionary changes in one of these plasmids, 
we reported on the screening of 46 teeth and 53 bones from the East 
Smithfield collection of London, England for presence of the Y. pestis- 
specific pPCP1 (ref. 3). Historical data indicate that the East Smithfield 
burial ground was established in late 1348 or early 1349 specifically for 
interment of Black Death victims® (Supplementary Figs 1 and 2), 
making the collection well-suited for genetic investigations of ancient 
Y. pestis. DNA sequence data for five teeth obtained via molecular 
capture of the full Y. pestis-specific pPCP1 revealed a C to T damage 
pattern characteristic of authentic endogenous ancient DNA’, and 
assembly of the pooled Illumina reads permitted the reconstruction 
of 98.68% of the 9.6-kilobase plasmid at a minimum of twofold 
coverage’. 

To evaluate the suitability of capture-based methods for recon- 
structing the complete ancient genome, multiple DNA extracts from 
both roots and crowns stemming from four of the five teeth which 
yielded the highest pPCP1 coverage’ were used for array-based enrich- 
ment (Agilent) and subsequent high-throughput sequencing on the 
Illumina GAII platform’®. Removal of duplicate molecules and sub- 
sequent filtering produced a total of 2,366,647 high quality chromo- 
somal reads (Supplementary Table 1a, b) with an average fragment 
length of 55.53 base pairs (Supplementary Fig. 4), which is typical for 
ancient DNA. Coverage estimates yielded an average of 28.2 reads per 
site for the chromosome, and 35.2 and 31.2 for the pCD1 and pMT1 
plasmids, respectively (Fig. 1a, c, d and Supplementary Table 1b, c). 
Coverage was predictably low for pPCP1 (Fig. le) because probes 
specific to this plasmid were not included on the arrays. Coverage 
correlated with GC content (Supplementary Fig. 6), a trend previously 
observed for high-throughput sequence data’. The coverage on each 
half of the chromosome was uneven due to differences in sequencing 
depth between the two arrays, with 36.46 and 22.41 average reads per 
site for array 1 and array 2, respectively. Although greater depth con- 
tributed to more average reads per site, it did not increase overall 
coverage, with both arrays covering 93.48% of the targeted regions 
at a minimum of onefold coverage (Supplementary Table 1b). This 
indicates that our capture procedure successfully retrieved template 
molecules from all genomic regions accessible via this method, and 
that deeper sequencing would not result in additional data for CO92 
template regions not covered in our data set. 

Genome architecture is known to vary widely among extant Y. pestis 
strains'*. To extrapolate gene order in our ancient genome, we ana- 
lysed reads mapping to the CO92 reference for all extracts stemming 
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derived from deaminated cytosines that would have been removed in 
the current investigation via uracil-DNA-glycosylase treatment before 
array capture. 

To place our ancient genome in a phylogenetic context, we charac- 
terized all 1,694 previously identified phylogenetically informative posi- 
tions’* (Supplementary Table 4), and compared those from our ancient 
organism against aggregate base call data for 17 publicly available 
Y. pestis genomes and the ancestral Y. pseudotuberculosis. When con- 
sidered separately, sequences from three of the four victims fall only two 
substitutions from the root of all extant human pathogenic Y. pestis 
strains (Fig. 3a), and they showa closer relationship to branch 1 Y. pestis 
than to branch 2; however, one of the four victims (individual 6330) was 
infected with a strain that contained three additional derived positions 
seen in all other branch 1 genomes™. This suggests either the presence of 


multiple strains in the London 1348-1350 pandemic or microevolu- 
tionary changes accruing in one strain, which is known to occur in 
disease outbreaks'*. Additional support for Y. pestis microevolution is 
indicated by the presence of several variant positions for which 
sequence data from one individual shows two different nucleotides at 
comparable frequencies (Supplementary Table 5). Position 2896636, 
for example, is a known polymorphic position in extant Y. pestis popu- 
lations’, and this position shows the fixed derived state in one indi- 
vidual (6330) and the polymorphic state in another (individual 8291) at 
minimum fivefold coverage (Supplementary Fig. 7). This provides a 
remarkable example of microevolution captured during an historical 
pandemic. The remaining variance positions are unchanged in the 18 
extant Yersinia genomes, thus they may be unique to the ancient 
organism and are, therefore, of further interest. Additional sampling 
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Figure 3 | Phylogenetic placement and historical context for the East 
Smithfield strain. a, Median network of ancient and modern Y. pestis based on 
1,694 variant positions in modern genomes". Coloured circles represent 
different clades as defined in ref. 13. Gray circles represent hypothetical nodes. 
b, Phylogenetic tree using 1,694 variable positions. Divergence time intervals 
are shown in calendar years, with neighbour-joining bootstrap support (blue 
italic) and Bayesian posterior probability (blue). Grey box indicates known 
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of ancient genomes will assist in determining the frequency of these 
mutations in co-circulating Y. pestis strains, and will clarify the emer- 
gence of branch 2 strains that are as yet unreported in ancient samples. 

Consistent tree topologies were produced using several construction 
methods and all major nodes were supported by posterior probability 
(pp) values of >0.96 and bootstrap values >90 (Fig. 3b and Sup- 
plementary Figs 8 and 9). The trees place the East Smithfield sequence 
close to the ancestral node of all extant human pathogenic Y. pestis 
strains (only two differences in 1,694 positions) and at the base of 
branch 1 (Fig. 3b). A secure date for the East Smithfield site of 
1348-1350 allowed us to assign a tip calibration to the ancient 
sequence and thus date the divergence time of the modern genomes 
and the East Smithfield genome using a Bayesian approach. Temporal 
estimates indicate that all Y. pestis commonly associated with human 
infection shared a common ancestor sometime between 668 and 729 
years ago (AD 1282-1343, 95% highest probability density, HPD), 
encompassing a much smaller time interval than recently published 
estimates’ and further indicating that all currently circulating branch 
1 and branch 2 isolates emerged during the thirteenth century at the 
earliest (Fig. 3b), potentially stemming from an Eastern Asian source 
as has been previously suggested’*. This implies that the medieval 
plague was the main historical event that introduced human popula- 
tions to the ancestor of all known pathogenic strains of Y. pestis. This 
further questions the aetiology of the sixth to eighth century Plague of 
Justinian, popularly assumed to have resulted from the same pathogen: 
our temporal estimates imply that the pandemic was either caused by a 
Y. pestis variant that is distinct from all currently circulating strains 
commonly associated with human infections, or it was another disease 
altogether. 

Although our approach of using an extant Y. pestis reference tem- 
plate for bait design precluded our ability to identify genomic regions 
that may have been present in the ancient organism and were sub- 
sequently lost in CO92, genomic comparisons of our ancient sequence 
against its closest outgroups may yield valuable insights into Y. pestis 
evolution. The Microtus 91001 strain is the closest branch 1 and 
branch 2 relative confirmed to be non-pathogenic to humans’®, hence 
genetic changes may represent contributions to the pathogen’s adapta- 
tion to a human host. Comparisons against this outgroup revealed 113 
changes (Supplementary Table 6a, b), many of which are found in 
genes affecting virulence-associated functions like biofilm formation 
(hmsT), iron-acquisition (iucD) or adaptation to the intracellular 
environment (phoP). Similarly, although its virulence potential in 
humans has yet to be confirmed to our knowledge, Y. pestis 
B42003004 isolated from a Chinese marmot population’” has been 
identified as the strain closest to the ancestral node of all Y. pestis 
commonly associated with human plague, and thus may provide key 
information regarding the organism’s evolution. Full genome com- 
parison against the East Smithfield sequence revealed only eight single- 
nucleotide differences (Supplementary Table 6c), six of which result in 
non-synonymous changes (Supplementary Table 6d). Although these 
differences probably do not affect virulence, the influence of gene loss, 
gene gain or genetic rearrangements, all of which are well documented 
in Y. pestis'*'*, is as yet undetermined. In more recent evolutionary 
terms, single-nucleotide differences in several known pathogenicity- 
associated genes were found between our ancient genome and the 
CO92 reference sequence (Supplementary Table 3), which may rep- 
resent further adaptations to human hosts. 

Through enrichment by DNA capture coupled with targeted high 
throughput DNA sequencing, we have reconstructed a draft genome 
for what is arguably the most devastating human pathogen in history, 
and revealed that the medieval plague of the fourteenth century was 
probably responsible for its introduction and widespread distribution 
in human populations. This indicates that the pathogen implicated in 
the Black Death has close relatives in the twenty-first century that are 
both endemic and emerging’’. Introductions of new pathogens to 
populations are often associated with increased incidence and severity 
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of disease” and although the mechanisms governing this phenomenon 
are complex’, genetic data from ancient infectious diseases will pro- 
vide invaluable contributions towards our understanding of host- 
pathogen coevolution. The Black Death is a seminal example of an 
emerging infection, travelling across Europe and claiming the lives of 
an estimated 30 million people in only 5 years, which is much faster 
than contemporary rates of bubonic or pneumonic plague infection” 
and dissemination”®’. Regardless, although no extant Y. pestis strain 
possesses the same genetic profile as our ancient organism, our data 
suggest that few changes in known virulence-associated genes have 
accrued in the organism’s 660 years of evolution as a human pathogen, 
further suggesting that its perceived increased virulence in history” 
may not be due to novel fixed point mutations detectable via the 
analytical approach described here. At our current resolution, we posit 
that molecular changes in pathogens are but one component of a 
constellation of factors contributing to changing infectious disease 
prevalence and severity, where genetics of the host population”, 
climate’*, vector dynamics*®, social conditions” and synergistic inter- 
actions with concurrent diseases” should be foremost in discussions of 
population susceptibility to infectious disease and host-pathogen rela- 
tionships with reference to Y. pestis infections. 


METHODS SUMMARY 


DNA from dental pulp was extracted and converted into sequencing libraries as 
previously described’. Potential sequencing artefacts resulting from deaminated 
nucleotides were eliminated by treatment of the DNA extracts with uracil-DNA- 
glycosylase and endonuclease VIII. DNA extracts were subsequently converted 
into sequencing libraries and amplified to incorporate unique sequence tags on 
both ends of the molecule. Two Agilent DNA capture arrays were designed for 
capture of the full Y. pestis chromosome (4.6 megabases), and the pCD1 (70 kb) 
and pMT1 (100 kb) plasmids using the modern Y. pestis strain CO92 (accession 
numbers NC_003143, NC_003131, NC_003134) for bait design with 3 bp tiling 
density. Serial array capture was performed over two copies of each array using the 
enriched fraction from the first round of capture as a template for a second round. 
The resulting products were amplified and pooled in equimolar amounts. All 
templates were sequenced for 76 cycles from both ends on the Illumina GAII 
platform, and reads merged into single fragments were included in subsequent 
analyses only if forward and reverse sequences overlapped by a minimum of 11 bp. 
Reads were mapped against the CO92 genome using the software BWA, and 
molecules with the same start and end coordinates were removed with the rmdup 
program in the samtools suite. Reference-guided sequence assembly was per- 
formed using Velvet version 1.1.03, with mapped and unmapped reads supplied 
in separate channels. Single-nucleotide differences were determined at a minimum 
of fivefold coverage and base frequency of at least 95% for both a pooled data set for 
all individuals and one in which all individuals were treated separately. A median 
network was constructed on these base calls using SplitsTree4. Phylogenetic trees 
were constructed using parsimony, neighbour-joining (MEGA 4.1) and Bayesian 
methods, and coalescence dates were determined in BEAST using both a strict and 
a relaxed molecular clock (Supplementary Fig. 9). 
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A natural polymorphism alters odour and DEET 
sensitivity in an insect odorant receptor 


Maurizio Pellegrino't, Nicole Steinbach'+, Marcus C. Stensmyr’, Bill S. Hansson? & Leslie B. Vosshall’? 


Blood-feeding insects such as mosquitoes are efficient vectors of 
human infectious diseases because they are strongly attracted by body 
heat, carbon dioxide and odours produced by their vertebrate hosts. 
Insect repellents containing DEET (N,N-diethyl-meta-toluamide) 
are highly effective, but the mechanism by which this chemical wards 
off biting insects remains controversial despite decades of investiga- 
tion’. DEET seems to act both at close range as a contact chemo- 
repellent, by affecting insect gustatory receptors’”, and at long range, 
by affecting the olfactory system'’. Two opposing mechanisms for 
the observed behavioural effects of DEET in the gas phase have been 
proposed: that DEET interferes with the olfactory system to block 
host odour recognition’” and that DEET actively repels insects by 
activating olfactory neurons that elicit avoidance behaviour* "'. Here 
we show that DEET functions as a modulator of the odour-gated ion 
channel formed by the insect odorant receptor complex'*'*. The 
functional insect odorant receptor complex consists of a common 
co-receptor, ORCO (ref. 15) (formerly called OR83B; ref. 16), and 
one or more variable odorant receptor subunits that confer odour 
selectivity’. DEET acts on this complex to potentiate or inhibit 
odour-evoked activity or to inhibit odour-evoked suppression of 
spontaneous activity. This modulation depends on the specific odor- 
ant receptor and the concentration and identity of the odour ligand. 
We identify a single amino-acid polymorphism in the second trans- 
membrane domain of receptor OR59B in a Drosophila melanogaster 
strain from Brazil that renders OR59B insensitive to inhibition by 
the odour ligand and modulation by DEET. Our data indicate that 
natural variation can modify the sensitivity of an odour-specific 
insect odorant receptor to odour ligands and DEET. Furthermore, 
they support the hypothesis that DEET acts asa molecular ‘confusant’ 
that scrambles the insect odour code, and provide a compelling 
explanation for the broad-spectrum efficacy of DEET against mul- 
tiple insect species. 

Previous work has shown that the odour of Drosophila food potently 
attracts adult D. melanogaster vinegar flies and that DEET blocks this 
attraction’’”. The behavioural effects of DEET require an intact olfactory 
system and the olfactory co-receptor ORCO’. These results implicated 
the olfactory system in the observed behavioural effects but failed both 
to distinguish between the two competing models of action for DEET 
and to determine whether DEET acts on the odour-specific odorant 
receptors, ORCO or both. We carried out electrophysiological record- 
ings of Drosophila olfactory sensory neurons (OSNs) to test these com- 
peting possibilities. 

In response to the suggestion that DEET and odours may interact in 
the vapour phase””®, we first quantified the respective amounts of 
vapour-phase 1-octen-3-ol emitted from the stimulus pipette in the 
presence and absence of DEET, using solid-phase microextraction 
(SPME) followed by gas chromatography mass spectroscopy analysis 
(GC-MS). The SPME measurements coupled to GC-MS (Fig. 1a) 
showed that the addition of a second filter paper containing pure 


DEET in the stimulus pipette had no significant effect on the release 
of 1-octen-3-ol (10 dilution). Thus, we can rule out any fixative role 
of DEET under the conditions used here. 

We next performed extracellular recordings to measure the effect of 
DEET on responses elicited by odours in Drosophila OSNs housed 
within the ab2 (Fig. la and Supplementary Fig. 1) or ab3 (Supplemen- 
tary Fig. 2) olfactory hairs, or sensilla, on the fly antenna. Each of these 
sensilla houses two OSNs expressing different odorant receptors with 
unique odour response profiles’. We measured the activity of these 
OSNs simultaneously and compared their responses to odour with and 
without co-presentation of DEET (Fig. 1b, c). 

The effect of DEET on four OSNs stimulated with ten structurally 
diverse odours was complex and dependent on odorant receptor, 
odour and concentration. In some OSNs, DEET suppressed odour- 
mediated inhibition (Fig. 1d, fand Supplementary Fig. 1a), in others it 
decreased odour-induced activation (Fig. le, Supplementary Fig. 1b, d, 
e and Supplementary Fig. 2a—g) and in others it had no effect (Fig. 1g 
and Supplementary Figs 1c and 2h-j). Moreover, the effects of DEET 
were strongly concentration dependent, such that high odour concen- 
trations often overcame the effects of DEET (Fig. 1 and Supplementary 
Figs 1 and 2). DEET presented alone, without odour stimuli, elicited no 
response above that evoked by solvent in ab2A and ab3A neurons, 
slightly activated ab2B neurons and slightly inhibited ab3B neurons; 
but responses were considerably smaller than those elicited by cognate 
odour ligands (Supplementary Fig. 3). Therefore, DEET alone has a 
negligible effect on olfactory responses in ab2 and ab3 neurons. 

Notably, 1-octen-3-ol presented in a dilution of 10 * had opposite 
effects on the two neurons housed in ab2 sensilla, inhibiting the ab2A 
neuron expressing OR59B-ORCO (Fig. 1d) and activating the ab2B 
neuron expressing OR85A-ORCO (Fig. le). Co-application of DEET 
inverted OSN responses to odour, leading to activation of the ab2A 
neuron (Fig. 1d) and suppressing the odour-induced activation of the 
ab2B neuron (Fig. le). Similar opposite effects of DEET were observed 
when the ab2 sensillum was stimulated with a different odour, 1-octanol 
(Supplementary Fig. 1a, b). 

Taken together, our results support the hypothesis that DEET acts 
as a molecular confusant, scrambling the Drosophila odour code by 
direct modulation of odorant receptor activity dependent on the type 
of odour and its concentration (Fig. 1h). Recent work examining the 
effect of DEET on mosquito odorant receptors in heterologous cells 
supports this hypothesis’*. 

Because the effects of DEET varied with the specific OSN and odour 
tested, it seems unlikely that DEET acts directly and solely on the 
conserved co-receptor ORCO, which is co-expressed in all the OSNs 
examined here. To determine whether DEET acts on the odour- 
specific odorant receptor subunit, we focused on the pharmacology of 
the OR59B-ORCO complex in ab2A OSNs. 1-octen-3-ol inhibits basal 
activity of OR59B-ORCO at low concentrations but acts as an agonist 
at high concentrations (Fig. 1d). DEET interfered with inhibition of 
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Figure 1 | DEET scrambles the Drosophila odour code. a, SPME and GC- 
MS quantitation of 10°? 1-octen-3-ol emitted from the stimulus pipette in the 
absence (cyan bar) or presence (blue bar) of pure DEET. Data represent peak 
area (NS, not significant; f-test; mean + s.e.m., n = 5). b-c, Representative 
traces of single-sensillum recordings from OR59B-ORCO in the ab2A OSN 
(red spikes) and OR85A—ORCO in the ab2B OSN (black spikes), stimulated by 
10-7 1-octen-3-ol with (b) and without (c) DEET, were recorded 
simultaneously and subsequently separated using spike-sorting algorithms. 
Bars represent 1-s odour stimulus. The delayed onset of odour response is a 
function of the odour delivery system. d-~g, Dose-response curves of OR59B- 
ORCO in ab2A (d, f, g) and OR85A-ORCO in ab2B (e), stimulated with 
increasing concentrations of 1-octen-3-ol (d, e), linalool (f) and methyl acetate 
(g) in the absence (light colour) or presence (dark colour) of DEET. Bar plots 
next to the dose-response curves represent responses to the solvent paraffin oil 
(PO) in the absence (grey bar) or presence (black bar) of DEET (**P < 0.01, 
*** PD < 0,001; F-test with Bonferroni correction; mean + s.e.m., n = 8-22). A, 
relative response (Methods); v/v, volume concentration. h, Summary of effects 
of DEET on the Drosophila ab2 and ab3 odour code derived from dose- 
response curves in d-g and Supplementary Figs 1 and 2. The significance of the 
change in response due to co-application of odorant and DEET was assessed 
using an F-test. NA, not applicable. 


OR59B-ORCO by 1-octen-3-ol, 1-octanol and linalool, but had no 
effect on odour-dependent activation by methyl acetate and 2,3- 
butanedione (Fig. 1g and Supplementary Fig. 1c). Notably, DEET 
had no effect on the OR59B-ORCO activation seen at higher concen- 
trations of 1-octen-3-ol. This selective effect on inhibition might be 
explained by the presence on the OR59B receptor of distinct 1-octen- 
3-ol-interaction sites, a high-affinity site that inhibits the odorant 
receptor complex and is modulated by DEET and a low-affinity 
DEET-independent site that activates the odorant receptor complex. 

To investigate the mechanistic basis of OR59B modulation by 
DEET, we turned to analysis of this receptor in D. melanogaster strains 
collected around the world. Polymorphisms in natural populations 
have been previously connected to different sensitivity to odours in 
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humans’’”®, and oxygen and carbon dioxide sensing in the nematode 


Caenorhabditis elegans’'. We reasoned that naturally occurring poly- 
morphisms in insect odorant receptors might modify odorant recep- 
tor/odorant interaction sites and affect their sensitivity to DEET. To 
search for putative polymorphisms that affect DEET responses, we 
assessed responses of ab2A neurons to 1-octen-3-ol in 10 7 dilution 
in the absence or presence of DEET in 18 wild-type D. melanogaster 
strains from locations around the world, and compared these res- 
ponses with those obtained in the w’’’* laboratory control strain 
(Fig. 2a, b and Supplementary Fig. 4a). In each strain, ab2 sensilla were 
identified by the characteristic size and location of the sensilla and 
responses of the ab2A cell to its cognate ligand, methyl acetate (data 
not shown). In 17 of the 18 strains, DEET increased responses of ab2A 
neurons to 10 * 1-octen-3-ol (Fig. 2b). However, ab2A neurons in the 
Brazilian strain Boa Esperanga were not inhibited by 1-octen-3-ol at 
any concentration tested and were therefore insensitive to modulation 
by DEET (Figs 2c and 3a, b and Supplementary Fig. 4b). In addition to 
the loss of inhibition by 1-octen-3-ol, the ab2A cell in the Brazilian 
strain showed robust activation by 1-octanol and ethyl hexanoate, 
odours that normally inhibit the ab2A cell in wild-type strains. 
Inhibition by linalool was equivalent in wild-type and Boa Esperanga 
strains (Fig. 3e). Excitatory responses to methyl acetate, ethyl acetate 
and 2,3-butanedione, both in the absence and presence of DEET, did 
not differ when compared with the corresponding w‘!’* neuron 
(Fig. 3c, d and Supplementary Fig. 5; data not shown). In control 
experiments, we confirmed that the odour response profiles of ab2A 
and ab2B OSNs in the Brazilian strain are otherwise similar to that of 
our w’’’* control strain (Fig. 3f and Supplementary Fig. 5). 

We proposed that a genetic polymorphism in Or59b in the Boa 
Esperanga strain may account for the changed responses to odour 
and DEET. We therefore sequenced and compared the coding region 
of Or59b in the 19 strains with the published Or59b sequence (NCBI 
reference sequence, NP_523822.1), and found seven missense poly- 
morphisms and 36 silent polymorphisms among all strains (Sup- 
plementary Table 1 and Supplementary Fig. 6). The protein sequence 
of OR59B in Boa Esperanga is referred to as OR59B"™ and varies from 
the NCBI reference at four amino-acid residues (Val41Phe, Val91 Ala, 
Tyr376Ser and Val388Ala). Among these, two are unique to this strain: 
Val41Phe, located in the amino terminus near transmembrane 
domain 1 (TM1), and Val91Ala, located within TM2 (Fig. 4a, b and 
Supplementary Fig. 6). On the basis of our within-strain sampling, we 
detected only one protein variant per strain except for the w''’* control 
strain, for which we identified two sequences: one identical to the 
published OR59B sequence (ORS9BNCP! REF) and one containing 
two missense changes (OR59B™“**! 7°7°°; Fig. 4a and Supplemen- 
tary Table 1). We analysed electrophysiological recordings obtained 
from the w''’* control strain for each odour tested and found no 
evidence that the responses sort into two phenotypically sey separable 
clusters. Therefore, we assume that the OR59BNC® and 
ORS9B" ‘°° haplotypes are functionally equivalent, at least for 
the odours tested in this study. The coding sequences of Orco in the 
w'''® and Boa Esperanca strains did not differ from the NCBI ref- 
erence (data not shown), which suggests that the protein sequence 
variations in the odour-specific subunit OR59B, rather than the co- 
receptor ORCO, eliminate inactivation by low concentrations of 
1-octen-3-ol and thereby render the odorant receptor complex 
insensitive to modulation by DEET. 

To test the functional consequences of the four OR59B missense 
changes in the Boa Esperanga strain, we generated transgenic flies 
carrying receptor variants each containing one of the four changes 
(Val41Phe, Val91Ala, Tyr376Ser or Val388Ala), a combination of 
the two unique to Boa Esperanga (Val41Phe and Val91Ala) or those 
shared with other strains (Tyr376Ser and Val388Ala), based on the 
ORS9BNC®! BFF backbone. OR59B variants were selectively expressed 
in the Drosophila Ahalo ‘empty neuron’ system'””, in which the endo- 
genous odour-specific odorant receptors in ab3A OSNs were replaced 
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Figure 2 | OR59B-ORCO sensitivity to DEET varies across wild-type D. 
melanogaster strains. a, Schematic of the screening protocol: 10 7 1-octen-3- 
ol was delivered in the absence and presence of DEET. b-c, Bar plots of odour- 


with our OR59B mutants (Fig. 4c and Supplementary Fig. 7). As 
expected, 10 7 1-octen-3-ol caused inhibition of ab3A neurons 
expressing ORS9BN©®! BFF and activation of ab3A neurons expres- 
sing OR59BP™ (Fig. 4c). Whereas OR59B"*”°*, ORS9BY*** and 
OR59B 1°74 V3884 ch owed normal inhibition to this odour, any variant 
of OR59B containing the Val91Ala change showed a loss of odour 
inhibition by 1-octen-3-ol and insensitivity to DEET (Fig. 4c). This 
demonstrates that the Val91 Ala change is sufficient to phenocopy the 
electrophysiological properties of the endogenous Boa Esperanga 
ORS59B (Fig. 4c). It has previously been shown that responses of 
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evoked responses of the w' 118 strain (b) and 18 wild-type strains (c) to 10°? 


1-octen-3-ol in the absence (light blue) or presence (dark blue) of DEET (t-test 
with Bonferroni correction; mean + s.e.m., m = 10-17). 


OR59B expressed in the empty neuron faithfully recapitulate receptor 
function measured in the endogenous ab2A neuron’*. We therefore 
assume that a strain carrying only the OR59BY’'“ polymorphism 
would have the same phenotype as Boa Esperanga. 

DEET shows behavioural efficacy in insects as diverse as 
Drosophila®’ and mosquitoes’ ***""'. We have shown that a single, 
naturally occurring polymorphism in an odour-specific odorant 
receptor can modify receptor interactions with an inhibitory odour 
and render the receptor insensitive to modulation by DEET. These 
results provide compelling evidence that DEET interacts directly with 
an odour-specific odorant receptor. Consistent with this, recent work 
showed that an odour-specific OR subunit is required for the beha- 
vioural effects of DEET on mosquito larvae’’. Our data imply a com- 
plexity in ligand-binding interactions within a single insect odorant 
receptor complex that bears further investigation. The Val91 Ala poly- 
morphism is located in the second predicted transmembrane domain 
but little is known about which domains of this novel class of odour- 
gated ion channels contribute to ligand binding or ion channel func- 
tion’’”*. A recent study implicated the third predicted transmembrane 
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Figure 3 |OR59B-ORCO neurons in the Boa Esperanca strain are 
insensitive to modulation by DEET. a—d, Dose-response curves of the 
OR59B-ORCO ab2A OSN in wild-type w’778 (solid line) and Boa Esperanca 
(dashed line) strains stimulated with increasing concentrations of 1-octen-3-ol 
(a, b) or methyl acetate (c, d), with (b, d) or without (a, c) DEET (F-test with 
Bonferroni correction; mean + s.e.m., n = 5-14). The dose-response curve of 

w'!?8 to 1-octen-3-ol in a and b is reproduced from Fig. 1d for comparison. Bar 
as next to the dose-response curves represent responses to the solvent 
paraffin oil in the absence (grey bar) or presence (black bar) of DEET (F-test 
with Bonferroni correction; mean + s.e.m., = 5-11). e, f, Bar plots comparing 
responses of OR59B-ORCO in ab2A (e) and OR85A-ORCO in ab2B (f) in 
w"448 (solid bar) and Boa Esperanca (dashed bar) strains to 10” 1-octen-3-ol, 
10 + 1-octanol, 107+ ethyl hexanoate and 10 ' linalool (t-test with Bonferroni 
correction; mean + s.e.m., 1 = 9-11). 
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Figure 4 | A single natural polymorphism in OR59B confers insensitivity to 
DEET. a, Haplotype network of OR59B protein variants. Each circle represents 
a unique ORS59B protein variant, its size proportional to the number of strains 
containing each variant. Connecting lines show the amino-acid substitutions 
that separate each variant. The bold circle represents the OR59BN@™! 8¥* variant 
with NCBI accession code NP_5238822.1. The Boa Esperanga strain is shown 
in red. b, Snake plot of OR59B showing the location of missense 
polymorphisms. Changes that differentiate Boa Esperanga from the NCBI 
reference are shown in red. c, Bar plots show the responses of Or59b variants 
ectopically expressed in ab3A neurons lacking endogenous OR22A and OR22B 
to 10 * 1-octen-3-ol in the absence (light blue) or presence (dark blue) of 
DEET. The locations of variant amino acids in OR59B are depicted in the 
cartoon snake plot on top of each set of bar graphs (t-test with Bonferroni 
correction; mean + s.e.m., n = 7-11). 
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domain of an insect odorant receptor in ligand interactions™, and 
additional structure-function work of this nature will ultimately reveal 
how these membrane proteins interact with odorants and modulators 
including DEET. Although Val and Ala are both amino acids with 
small aliphatic side chains, Val—-Ala substitutions have been shown 
to affect other cation channels’. It therefore is plausible that this 
change would affect the function of the odour-gated ion channel subunit 
encoded by OR59B. We speculate that the Val91Ala polymorphism 
inactivates a high-affinity binding site for 1-octen-3-ol that locks the 
receptor into a closed configuration at low odour concentration. A 
separate site on the receptor would have a low-affinity binding site that 
would lead to activation. In this model, DEET would selectively interfere 
with the high-affinity binding site. Future investigation of the structure— 
function relation of this receptor is needed to test these ideas. Genetic 
insensitivity to DEET has previously been shown to exist in both 
Drosophila flies’ and Aedes aegypti mosquitoes’ but the genes respons- 
ible remain unknown. It will be interesting to investigate whether accu- 
mulated odorant receptor polymorphisms contribute to these 
phenotypes. 

It has recently been proposed that DEET directly activates beha- 
vioural repulsion through the activation of odorant receptors that medi- 
ate avoidance behaviours*'°. The insect odorant receptor repertoire is 
highly diverse with very low protein similarity across insect species****. 
Furthermore, different species respond very selectively to host odour 
cues that meet disparate ecological needs*”*. It seems unlikely that a 
single molecule like DEET would activate a different yet similarly potent 
repulsive behaviour in all insects tested. Instead, our data support the 
hypothesis that DEET is a broad-selectivity insect odorant receptor 
modulator that alters the fine-tuning of the insect olfactory system. 
DEET-mediated scrambling of the odour code would interfere with 
behavioural responses as diverse as mosquitoes orienting to host odours 
produced by humans” and the attraction of Drosophila to yeast on 
rotting fruit”. 


METHODS SUMMARY 

Fly strains and molecular biology. D. melanogaster stocks were maintained on 
conventional cornmeal-agar—-molasses medium in a 12-h-light, 12-h-dark cycle at 
25 °C. Details of molecular biology manipulations, all primers and fly strains are in 
Methods. 

Single-sensillum extracellular recordings. Recordings of female fly antennae 
were performed as described previously’ and are detailed in Methods. The respec- 
tive amounts of 1-octen-3-ol emitted from the stimulus pipettes with and without 
DEET was investigated through SPME and linked GC-MS analysis as detailed in 
Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Genomic DNA. DNA was prepared according to the Quick Fly Genomic DNA 
Prep protocol from the Berkeley Drosophila Genome Project (http://www. fruitfly. 
org/about/methods/inverse.pcr.html). DNA (1.5 pl) was used for amplification 
using the KOD PCR Kit (Novagen). For Or59b, primers were designed to anneal 
to the 5’ and 3’ untranslated regions of the w’’/* Or59b locus: 5'-gaattcTCCGGG 
TATAAAGTGCAGGTGCTGGCACCG-3’ (forward); 5'-ctegagGCTCTTTTTT 
GCGGGGGCTCATGGGTGCAG-3’ (reverse). 

Orco was amplified using primers that amplify the complete coding region: 
5'-gaattcATGACAACCTCGATGCAG-3’ (forward); 5’-caattgCTTGAGCTGCA 
CCAGCACCA-3’ (reverse). 

PCR products were cloned into pGEM-T Easy (Promega Corporation,), 
sequenced (GENEWIZ, Inc.) and analysed using SeqMan software (DNASTAR, 
Inc.). For each strain, at least four independent samples were analysed, derived 
from at least two different genomic preparations and two different PCR reactions. 
These were sequenced and compared to NCBI reference sequences for each gene 
(Or59b: NM_079098.1; Orco: NM_079511.4). 

Complementary DNA preparation and transgenic flies. Total RNA was 
extracted from w’""* and Boa Esperanca antennae using the RNeasy Mini Kit 
(QIAGEN). 

Complementary DNA (cDNA) synthesis was performed according to the 
SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen) using 
oligo(dT) primers. Or59b cDNA from both w’’’* and Boa Esperanga was amp- 
lified using these gene-specific primers: 5'-gaattcATGGCGGTGTTCAAGCT 
AATCAAACCG-3’ (forward); 5'-ctegagTTACTGGAACTGCTCGGCCAGATT 
CA-3’ (reverse). 

PCR products representing full-length w!!® Ors9bNC®! BEF and Ors9b% 
cDNAs were cloned into pGEM-T Easy, completely sequenced and subcloned 
into the pUAST attB vector*' using EcoRI and Xhol restriction sites. 

Single point mutations were introduced into the w'/* Ors9bN™! 8®¥ cDNA by 
directed PCR mutagenesis. Two independent reactions were prepared: one con- 
tained the forward primer with the desired mutation and the reverse SP6 vector 
primer (5'-ATTTAGGTGACACTATAG-3’). The second contained the reverse 
mutating primer and the forward T7 vector primer (5’-TAATACGACTCAC 
TATAGGG-3’). PCR products from the reactions were purified and 1 jl of each 
was used as a template and mixed in a second round of amplification with T7 and 
SP6 primers to obtain the full gene. For each mutagenesis, the final PCR product 
was purified and subcloned in pGEM-T Easy, and the complete Or59b cDNA 
carrying the induced mutations was sequenced for verification and compared with 
the Or59bN™! REF sequence. 

The double mutants Or59b"""" V?!4 and Or59b"”° V4 were generated using 
ors9b'""" or Or59b"”* as a template and a second round of mutagenesis was 
implemented with the corresponding primers. 

The following primers were used. Or59b’"": 5'-CCGCCGAAGGAGGGATT 
CCTGCGCTACGTGT-3’ (forward); 5’-ACACGTAGCGCAGGAATCCCTCC 
TTCGGCGG-3' (reverse). Or59b’"!“; 5'-AGGTGTGCATCAATGCGTATGGC 
GCCTCGG -3' (forward); 5’-CCGAGGCGCCATACGCATTGATGCACACCT 
-3' (reverse). Or59b'*”S: 5'-TGAACAGCAACATAAGCGTGGCCAAGTTC 
GC-3' (forward); 5’-GCGAACTTGGCCACGCTTATGTTGCTGTTCA-3’ (reverse). 
ors9b’***4; 5'-GCATCATTACAATAGCGCGACAAATGAATCT-3’ (forward); 
5'-AGATTCATTTGTCGCGCTATTGTAATGATGC-3’ (reverse). 

Transgenic animals were generated in the w’/”® genetic background (Genetic 
Services, Inc.) using the phiC31-based integration system”' targeted at the attP2- 
docking site on chromosome II (ref. 32). 

Fly stocks. Drosophila melanogaster stocks were maintained on conventional 
cornmeal—agar-molasses medium in a 12-h-light, 12-h-dark cycle at 25°C. The 
w'''S strain was used as wild-type control. 

The following wild-type strains were used: Akayu [Drosophila Genetic Resource 
Center (DGRC) #103389; origin, Japan]; Algeria (isogenic for II and III chromo- 
somes, DGRC #103390; origin, Algeria); Alma-Ata (DGRC #103391; origin, 
Kazakhstan); Canton-S (isogenic for II and III, lab stock; origin, Ohio, USA); 
CA1 (Bloomington Drosophila Stock Center #3846; origin, Cape Town, South 
Africa); Coffs Harbour (DGRC #103411; origin, New South Wales, Australia); 
Kericho-7B (DGRC #103428; origin, Kericho, Kenya); Manago (isogenic for II 
and III, DGRC #103433; origin, Hawaii, USA); Oregon-R (isogenic for II and III, 
lab stock; origin, Oregon, USA); San Miguel (isogenic for II and III], DGRC 
#103450; origin, Buenos Aires, Argentina); WT Berlin (isogenic for II and III, 
Heisenberg laboratory, Wurzburg, Germany; origin, Berlin, Germany); Batumi-L 
(DGRC #103396; origin, Batumi, Georgia); Boa Esperanca (DGRC #103400; origin, 
Minas Gerais, Brazil); BOG2 (Bloomington #3842; origin, Bogota, Colombia); CO3 
(Bloomington #3848; origin, Commack, New York, USA); EV (Bloomington 
#3851; origin, Ellenville, New York, USA); Medvast-21 (DGRC #103435; origin, 
Finland); VAG 2 (Bloomington #3876; origin, Athens, Greece). 
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The following mutant alleles and transgenic flies were used: Or22a/b“""” (ref. 
33) and Or22a-Gal4 (ref. 34). The genotypes of the flies used for Fig. 4c and 
Supplementary Fig. 8 are as follows: Or22a/b4"*”;Or22a-Gal4/UAS-Or59b 
(labelled Or59bNC#! REF in the figure), Or22a/b4"*”;Or22a-Gal4/UAS-Or59b' 
(V41F), Or22a/b4"*”";0r22a-Gal4/UAS-Or59b’°'4 (V91A), Or22a/b“""”;Or22a-Gal4/ 
UAS-Ors9B'"" YA (v41F V9LA), Or22a/b""”;Or22a-Gal4/UAS-Or59b”°S 
(1376S), Or22a/b4"*”;0r22a-Gal4/UAS-Or59b*** (388A), Or22a/b“"*”;0r22a- 
Gal4/UAS-Or59b"”S_ V4 (T3765 V388A) and Or22a/b4"*”;Or22a-Gal4/UAS- 
Or59bV 418 VIA 1978S V38EA (WATE V9LA T3768 V388A). 

SPME quantification of emitted volatiles. The effect of DEET on the amount of 
1-octen-3-ol emitted from the stimulus pipettes was investigated through SPME and 
linked GC-MS analysis. Stimulus pipettes, prepared as per the electrophysiology 
experiments, were loaded either with one filter strip impregnated with 5 ul of 
1-octen-3-ol (10-7) and with a second strip containing 5 ll of paraffin oil, or with 
the second strip impregnated with 5 ul of pure DEET. The pipettes were connected 
to a stimulus controller (Syntech CS 55; www.syntech.nl) and volatiles emitted from 
the pipettes during ten puffs, of 2-s duration each, delivered with 1-s intervals, were 
trapped on a SPME fibre (Supelco blue fibre; 57310-U; polydimethylsiloxane/divi- 
nylbenzene, 65-|im coating; http://www.sigmaaldrich.com), inserted 2 cm into the 
pipette tip. After completion of the stimulus cycle, the SPME fibres were immediately 
retracted and injected into a GC-MS device for quantification. This device (Agilent 
GC6890N fitted with MS5975B unit; http://www.agilent.com) was equipped with a 
HP5-MS column (Agilent Technologies) and operated as follows. The inlet temper- 
ature was set to 250°C. Desorption time was 1 min. The temperature of the gas 
chromatography oven was held at 70°C for 2min and then increased by 
20°C min‘! to 280 °C, with the final temperature held for 2 min. For mass spectro- 
scopy, the transfer line was held at 280°C, the source at 230°C and the quad at 
150 °C. Mass spectra were taken in EI mode (at 70 eV) in the range from 33m/z to 
350m/z, with a scanning rate of 4.42 scans per second. GC-MS data were processed 
with the MDS-CHEMSTATION software (Agilent Technologies), and peak areas 
were autointegrated. Five replicates were collected for each condition and data were 
plotted as mean ~ s.e.m. Statistical significance was assessed using a t-test. 
Electrophysiology and odorants. Female transgenic flies were recorded at 5d 
after adult eclosion. All other flies were recorded at 5-10d after adult eclosion. 
Single-sensillum recordings were performed as described previously**”*. For each 
experiment in which we recorded OR59B variants expressed in the ab3A neuron, 
we verified that responses of endogenous OR59B in the native ab2A neuron 
showed normal inhibition by 10 * 1-octen-3-ol (data not shown). Odorants were 
obtained from Sigma-Aldrich at high purity and diluted (v/v) in paraffin oil as 
indicated. DEET was obtained from Alfa Aesar and was applied undiluted. 
Chemical Abstracts Service (CAS) numbers are as follows: paraffin oil (8012- 
95-1); 1-octen-3-ol (3391-86-4); pentanal (110-62-3); pentanoic acid (109-52-4); 
2-heptanone (110-43-0); 1-octanol (111-87-5); (—)linalool (126-91-0); methyl 
acetate (79-20-9); 2,3-butanedione (431-03-8); ethyl hexanoate (123-66-0); butyr- 
aldehyde (123-72-8); ethyl-3-hydroxybutyrate (5405-41-4); ethyl acetate (141-78- 
6); hexanol (111-27-3); DEET (134-62-3). 

The desired odour dilution (30 11) was pipetted onto a filter paper strip (3 mm 
X 50 mm) and 30 ul of undiluted DEET or paraffin oil solvent was pipetted onto a 
second filter paper strip. Both filter paper strips were then carefully inserted into a 
glass Pasteur pipette. Before any recordings, charcoal-filtered air was forced 
through the pipette for 1-3 s to remove dead space in the odour delivery system. 
For actual recordings, charcoal-filtered air was continuously applied to the insect 
antenna, with odour delivered through the pipette to the fly antennae for 1 s. Each 
pipette was used at most three times and no more than three sensilla were tested 
per animal. Sensilla types were identified by size, location on the antenna and 
responsiveness to known preferred odorants”. 

Data were collected using AUTOSPIKE (Syntech) and analysed by custom spike- 
sorting algorithms*. Responses were initially classified as excitatory or inhibitory by 
visual inspection of the responses after odour application. An odour was classified as 
excitatory if it increased the spontaneous firing rate and inhibitory if it decreased the 
spontaneous firing rate. The data were then analysed by subtracting average spon- 
taneous activity (expressed as spikes per second) in the 15 s before odour application 
from activity either in the first 600 ms after odour delivery, for excitatory odorants, 
or in the first 1s, for inhibitory odorants. This value is referred to as A, and will 
typically have a negative value for inhibitory odorants and a positive value for 
excitatory odorants. The onset of odour-evoked responses varied owing to slight 
variations in the position of the odour delivery system relative to the sensillum being 
recorded. To correct for this, we calibrated the inferred odour onset on the basis of 
excitatory responses elicited by control stimuli applied at the beginning of each trial 
(ab2, 10° methyl acetate; ab3, 10° 2-heptanone). 

Statistical analysis. Dose-response curves were fitted with ORIGINPRO 8 
(OriginLab) using a logistic function, except for responses to 1-octen-3-ol in 
Fig. 1d, which used a biphasic function. 
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Comparisons of paired dose-response curves in Figs 1 and 3 and Sup- 
plementary Figs 1, 2 and 4 used an F-test to assess the statistical significance of 
differences between the two curve fits. A two-tailed t-test was performed for all 
comparisons in Fig. li (non-paired), Figs 2-4 and Supplementary Figs 3, 4 and 7 
(paired). Type I errors were addressed by using a Bonferroni correction for mul- 
tiple comparisons applied to each set of experiments. Data in Supplementary Fig. 6 
were fitted using a linear regression analysis. 

The OR59B snake plots in Fig. 4 and Supplementary Fig. 7 were hand- 
composed on the basis of transmembrane domain predictions generated with 
the PredictProtein algorithm”’. 
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STING is a direct innate immune sensor of 
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The innate immune system detects infection by using germline- 
encoded receptors that are specific for conserved microbial molecules. 
The recognition of microbial ligands leads to the production of 
cytokines, such as type I interferons (IFNs), that are essential for 
successful pathogen elimination. Cytosolic detection of pathogen- 
derived DNA is one major mechanism of inducing IFN produc- 
tion’”, and this process requires signalling through TANK binding 
kinase 1 (TBK1) and its downstream transcription factor, IFN- 
regulatory factor 3 (IRF3). In addition, a transmembrane protein 
called STING (stimulator of IFN genes; also known as MITA, ERIS, 
MPYS and TMEM173) functions as an essential signalling adaptor, 
linking the cytosolic detection of DNA to the TBK1-IRF3 signalling 
axis*’. Recently, unique nucleic acids called cyclic dinucleotides, 
which function as conserved signalling molecules in bacteria®, have 
also been shown to induce a STING-dependent type I IFN res- 
ponse”*. However, a mammalian sensor of cyclic dinucleotides 
has not been identified. Here we report evidence that STING itself 
is an innate immune sensor of cyclic dinucleotides. We demonstrate 
that STING binds directly to radiolabelled cyclic diguanylate 
monophosphate (c-di-GMP), and we show that unlabelled cyclic 
dinucleotides, but not other nucleotides or nucleic acids, compete 
with c-di-GMP for binding to STING. Furthermore, we identify 
mutations in STING that selectively affect the response to cyclic 
dinucleotides without affecting the response to DNA. Thus, 
STING seems to function as a direct sensor of cyclic dinucleotides, 
in addition to its established role as a signalling adaptor in the IFN 
response to cytosolic DNA. Cyclic dinucleotides have shown promise 
as novel vaccine adjuvants and immunotherapeutics””’, and our 
results provide insight into the mechanism by which cyclic dinucleo- 
tides are sensed by the innate immune system. 

Nucleotides are crucial signalling molecules in all domains oflife, but 
cyclic dinucleotides seem to be produced solely by Bacteria and 
Archaea. For example, c-di-GMP is a ubiquitous second messenger 
that regulates biofilm formation, motility and virulence in a diverse 
range of bacterial species*. Recently, cyclic diadenylate monophosphate 
(c-di-AMP) was discovered to be a bacterial regulatory molecule”, 
although its role remains to be fully characterized. Because they are 
unique to microorganisms, cyclic dinucleotides are appropriate targets 
for immune recognition’*. Indeed, the induction of IFN production by 
Listeria monocytogenes depends on bacterial secretion of cyclic- 
di-AMP’”. However, it remains unclear how cyclic dinucleotides are 
sensed in mammalian cells. 

To address the mechanism by which mammalian cells sense cyclic 
dinucleotides, we first confirmed that cyclic dinucleotides are detected 
in the host cell cytosol’? by expressing RocR, a c-di-GMP-specific 
phosphodiesterase from Pseudomonas aeruginosa, in the cytosol of 
macrophages. In these cells, IFN induction in response to c-di-GMP 
(but not other stimuli) is tenfold lower than that in control, vector- 
transduced, cells (Fig. 1a), confirming that the cytosolic presence of 
c-di-GMP is important for inducing IFN. 


To identify candidate cyclic dinucleotide sensors, we sought to 
identify molecules that could reconstitute the IFN response to cyclic 
dinucleotides in HEK293T cells, which do not respond to c-di-GMP"””. 
Because STING is essential for the IFN response to cyclic dinucleo- 
tides'' and because STING expression is low or undetectable in 
HEK293T cells (Supplementary Fig. 1 and data not shown), we first 
expressed STING in HEK293T cells. The overexpression of STING 
spontaneously induces an IFN reporter**’, so we transfected a small 
amount of Sting-encoding vector that by itself was insufficient to 
induce IFN. Low levels of STING protein were sufficient to reconstitute 
the responsiveness of HEK293T cells to c-di-GMP (Fig. 1b) and 
c-di- AMP (Fig. 1c). By contrast, the non-functional goldenticket (gt) 
allele of Sting (which results in a STING protein in which asparagine 
has been substituted for isoleucine, I199N)"' did not restore respon- 
siveness to c-di-GMP (Fig. 1b). Interestingly, the expression of wild- 
type STING did not reconstitute the responsiveness of HEK293T cells 
to double-stranded DNA (dsDNA) oligonucleotides (for example, a 
70-base-pair oligonucleotide from vaccinia virus (VV70mer) or IFN- 
stimulatory DNA (ISD)) that had previously been shown to induce 
type I IFNs in macrophages through STING*’® (Fig. 1b and Sup- 
plementary Fig. 2a). By contrast, the induction of IFN by poly(dA- 
dT)epoly(dT-dA) DNA (denoted poly(dAT:dTA)) was identical in 
cells that were transfected with wild-type Sting and those transfected 
with gt, demonstrating that the RNA polymerase III DNA-sensing 
pathway’”"* is intact in these cells and is not responsible for the detec- 
tion of c-di-GMP (Fig. 1d). As a positive control, Myd88"'~ Trif '~ 
immortalized macrophages, which express STING, responded 
similarly to c-di-GMP, poly(dAT:dTA), VV70mer and ISD (Fig. le 
and Supplementary Fig. 2b). Together, our results show that STING 
expression is sufficient to restore the responsiveness of HEK293T cells 
to cyclic dinucleotides but not to DNA. 

We next tested whether STING, or perhaps another protein in 
HEK293T cells, binds to c-di-GMP. We used HEK293T cell lysates 
in an in vitro ultraviolet radiation crosslinking assay to identify putative 
sensor proteins that interact directly with radiolabelled c-di-GMP 
(c-di-[**P]GMP). We expected to identify directly interacting proteins 
because only molecules within bond-length proximity are efficiently 
crosslinked by ultraviolet radiation’. We detected a prominent ~40- 
kDa radiolabelled protein, which corresponds to the predicted 
molecular weight of monomeric STING, in the lysates of cells trans- 
fected with a vector encoding haemagglutinin (HA)-tagged STING 
(STING-HA) but not in the lysates of cells transfected with a vector 
encoding STING(I199N)-HA or vector only (Fig. 2a). The ~40-kDa 
band did not appear when the same lysates were crosslinked in the 
presence of [°?P]GTP, implying that crosslinking to c-di-[??P]GMP 
was specific (Fig. 2a). We also observed an ~80-kDa species, which 
might correspond to a previously reported STING dimer® (Fig. 2b). 
To test the hypothesis that STING crosslinks with c-di-[??P]GMP, 
we immunoprecipitated STING from transfected HEK293T 
cells and performed the c-di-[**P]GMP crosslinking assay on the 
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immunoprecipitates. Bands corresponding to the molecular weight of 
the STING monomer and dimer were identified only in immuno- 
precipitates from lysates overexpressing STING and not in mock 
immunoprecipitates from lysates of vector-only-transfected cells 
(Fig. 2b). Thus, STING seems to bind to c-di-GMP. 

To confirm that the binding of c-di- [°°P]GMP to STING is specific, 
we performed the c-di-GMP crosslinking assay in the presence of 


Figure 1 | STING is sufficient to restore responsiveness to cyclic 
dinucleotides. a, Immortalized Myd88 ’ Trif ‘~ macrophages were 
transduced with retrovirus expressing rocR and then stimulated for 6h. IFN 
induction was measured by quantitative PCR with reverse transcription 
(quantitative RT-PCR) and normalized to ribosomal protein 17 (Rps17) 
expression levels. b-d, HEK293T cells were transfected as indicated, together 
with an IFN-luciferase reporter, and luciferase activity was measured 6 h after 
stimulation. e, Myd88"’~ Trif ‘~ macrophages were stimulated for 6h, and 
IEN induction was measured as in a. a—e, Data are presented as mean + s.d. 
(n = 3) and are representative of at least three independent experiments. ***, 
P<0.001. SeV, Sendai virus; TMEV, Theiler’s murine encephalomyelitis virus; 
unstim, unstimulated; VV70mer, a stimulatory dsDNA oligonucleotide derived 
from vaccinia virus. 


unlabelled nucleotides. Unlabelled c-di-GMP and c-di-AMP specif- 
ically competed with c-di-[**P]GMP for binding to STING (Fig. 2c, 
d). By contrast, GTP, other guanosine derivatives and nucleic acids 
(including dsDNA) competed away nonspecific binding (Fig. 2c, d, 
asterisk); however, under our specific assay conditions, these molecules 
could not compete efficiently with c-di-[**P]GMP for binding to 
STING (Fig. 2c, d, arrow). Because the cell cytosol contains high con- 
centrations of GTP (0.1-1mM), a putative c-di-GMP sensor must 
have a high degree of specificity for c-di-GMP over GTP. We found 
that c-di-GMP efficiently crosslinked to STING even in the presence of 
1 mM GTP (Fig. 2c). 

Although these data imply that STING directly and specifically 
binds to cyclic dinucleotides, they do not address whether other host 
proteins might also be required. STING is predicted to encode an 
amino-terminal domain with multiple transmembrane segments, fol- 
lowed by a globular carboxy-terminal domain (CTD). Because the 
CTD contains the amino acid substitution that abolishes STING func- 
tion in gt mice (I199N)"’, we suspected that the CTD might be involved 
in binding to cyclic dinucleotides. Thus, we subjected purified recom- 
binant Hisg-tagged STING CTD (amino acids 138-378) (Fig. 2e) to the 
c-di-[°*P]GMP crosslinking assay. We found that the recombinant 
CID of STING bound to c-di-[?’?P]GMP and that this binding 
was specifically competed away with cold (unlabelled) c-di-GMP or 
c-di-AMP but not with cold GTP or ATP (Fig. 2f). We used equilibrium 


b ei < Cc 
Crosslink: c-di-[S2P]GMP [82P]GTP = x Crosslink: [82P]GTP c-di-[22P]GMP 
x = 5 6 5 6 
75 Fs 2 82 < 
892592 kDa $ G kDa & & sas STING-HA 
BES RES 100- 100- S202 0 cdi-GMP | c-diAMP (Cold nt) 
=~ OI > HW = 
MDa F 75- —- = a es DFO a Ez. Pll 
= > 2 
. lela 
Autoradio- 50-" 50- 
50— om 5 
37— graph 37- 37- TT y 
25-— L 25- 
* 25- vemeamees 
SS HA 5G 20- 
— ee oe er a ae Anti-actin IP: HA IP: HA 
= Colloidal Autoradio- SR ceaed 
blue graph ee 
ds Crosslink: STING-HA + c-di-[22P]GMP e f g 
ca 2 Zo oo Bmax = 0.5 + 0.001 
cS 3% a Crosslink: Hiss-STING 138-378 + c-di- an ion 0.6 h=1.4+01 
3 2g (Cold nt) oe No c-di-GMP c-di-AMP GTP aan Zo] Ka = 4.9 uM = 0.4 
9.8 6 kDa 2 kDa COld nt me (C248 ar 
kDa 2 a mm | 100- 75- | 2 0. 
75- = =8 =8 sageee 3 
= a 
50- ee S 0.2 
37- a | O) 
25-  —25- “oe " 30 
20- = “ os) = 0.01 0.1 1 10 100 1,000 


Autoradiograph 


Figure 2 | STING binds cyclic dinucleotides. a, HEK293T cells were 
transfected as indicated, and the cell lysates were subjected to an in vitro 
ultraviolet radiation crosslinking assay with c-di-[°*P]GMP. Samples were 
separated by SDS-PAGE and visualized by autoradiography or western 
blotting. b, HEK293T cells were transfected as in a, and anti-HA 
immunoprecipitates were stained with colloidal blue or subjected to the 
crosslinking assay with c-di- [°°P]GMP. c, d, HEK293T cells were transfected as 
in a, and the cell lysates were crosslinked with ultraviolet radiation to c-di- 
[°°P]GMP or [**P]GTP in the presence of cold competing nucleotides in 
tenfold serial dilutions beginning at 1 mM (as indicated by the wedges), except 
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[o-di-GMP]j,25 (1M) 


for guanosine (beginning at 0.1 mM), VV70mer (500 pg ml’), 
poly(dAT:dTA) (50 pg ml ') and poly(I)*poly(C) (denoted poly(I:C)) 

(50 pg ml _'). The arrow indicates STING, and the asterisk indicates a 
nonspecific band. nt, nucleotides; ppGpp, guanosine-3’,5'-bisdiphosphate. 

e, Hiss-STING (amino acids 138-378) (1 ig) was separated by SDS-PAGE and 
stained with Coomassie blue. f, Hiss-STING 138-378 was analysed as in 

c. g, The binding of c-di-GMP to purified Hiss-STING (10 uM) was measured 
by equilibrium dialysis. B,,.,, maximum number of binding sites; h, hill 
coefficient; Kg, dissociation constant. a-g, Data are representative of three 
independent experiments. 
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dialysis to obtain an estimate of the affinity (dissociation constant, 
Kg) of c-di-GMP binding to the STING CTD, which was ~5 uM 
(Fig. 2g). In its native membrane-bound form, or in complex with 
other host factors, STING may have a stronger affinity for c-di- 
GMP; nevertheless, a 5 [1M affinity is consistent with the dose response 
that has previously been observed in macrophages’”. Consistent with the 
ability of STING to dimerize’®, the binding data suggest a stoichiometry 
of one molecule of c-di-GMP per two molecules of STING. 

To identify the amino acids involved in c-di-GMP binding and/or 
IEN induction, we introduced point mutations into STING. Focusing 
on clusters of conserved and charged residues, we mutated 67 amino 
acids, either individually or in groups, and we classified these mutants 
into five categories (Fig. 3, Supplementary Table 1 and Supplementary 
Figs 3 and 4). Class I consists of mutants that abolish both binding and 
IEN induction (Fig. 3a—c, red, and Supplementary Table 1). Class II 
mutants bind to c-di-GMP but fail to induce IFN (Fig. 3c, purple). 
Class III comprises ‘hyperactive’ mutants, which spontaneously induce 
IFN at low levels of transfection (Fig. 3a—c, green, and Supplementary 
Table 1). Class [V mutants induce IFN when overexpressed but do not 
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Figure 3 | Mutational analysis of STING. a, HEK293T cells were transfected 
with vectors encoding STING and STING mutants as indicated, together with 
an IFN-luciferase reporter, and luciferase activity was measured 6h after 
stimulation. Data are presented as mean + s.d. (n = 3). b, HEK293T cells were 
transfected as in a, except that the lysates were subjected to the crosslinking 
assay with c-di-[°*P]GMP as in Fig. 2a. c, Organization of STING based on the 
membrane topology prediction programs SOSUI, TMHMM, HMMTOP and 
TMpred. Strongly predicted transmembrane domains are boxed; weakly 
predicted transmembrane domains have dashed boxes. Coloured residues 
indicate mutant classes (see text for description of mutants): class I (red), class I 
(purple), class III (green), class IV (blue) and class V (yellow). Bracketed 
mutations were made in combination. a-c, Data are representative of at least 
three independent experiments. 
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respond to c-di-GMP (Fig. 3a-c, blue, and Supplementary Table 1). 
Class V consists of mutants that have no effect on c-di-GMP binding or 
IFN induction (Fig. 3c, yellow, and Supplementary Table 1). Although 
mutating STING can result in diverse phenotypes, a key finding is that 
all mutants that failed to bind to c-di-GMP also lost the ability to 
induce IFN in response to c-di-GMP. Consistent with our observation 
that the CTD is sufficient for binding to c-di-GMP (Fig. 2f), all muta- 
tions that affected c-di-GMP binding were located within the CTD. 
DNA and cyclic dinucleotides induce indistinguishable transcrip- 
tional responses in macrophages’’, and STING seems to be essential 
for both responses*’’. However, we found that STING expression is 
insufficient to restore the responsiveness of HEK293T cells to DNA, in 
contrast to cyclic dinucleotides (Fig. 1b and Supplementary Fig. 2a). 
Moreover, our competition assays indicate that DNA does not com- 
pete with cyclic-di-GMP for binding to STING under the conditions 
tested (Fig. 2d). Thus, although our data indicate that STING functions 
as a direct immunosensor of cyclic dinucleotides, it seems likely that 
additional host proteins are involved in IFN induction by DNA. 
Indeed, two candidate DNA sensors, DAI (also known as ZBP1) and 
IFI16, have been identified'®”’, neither of which seems to be essential 
for the response to cyclic dinucleotides (ref. 10 and data not shown). 
To determine whether the responsiveness to cyclic dinucleotides and 
DNA are separable functions of STING, we sought to identify STING 
mutants that fail to respond to cyclic dinucleotides but still respond to 
DNA. We identified a STING mutant (R231A) that was unresponsive 
to c-di-GMP (Fig. 4a), although it induced IFN when overexpressed 
(Fig. 4a) and bound to c-di-GMP (Fig. 4b). Interestingly, STING 
R231A was able to restore the responsiveness of gt bone marrow 
macrophages to DNA but not to cyclic-di-GMP (Fig. 4c). Thus, cyclic 
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Figure 4 | The IFN response to DNA and c-di-GMP can be uncoupled. 

a, HEK293T cells were transfected as indicated, together with an IFN-luciferase 
reporter, and IFN reporter activity was measured 6h after stimulation. 

b, HEK293T cells were transfected as in a, except that the lysates were subjected 
to the crosslinking assay with c-di-[°*P]GMP as in Fig. 2a. c, Bone-marrow- 
derived macrophages from Sting-deficient (gt) mice were transduced with the 
indicated constructs. IFN induction in response to transfected cyclic-di-GMP 
or VV70mer was measured by quantitative RT-PCR and normalized to Rps17. 
** P< 0.001; NS, not significant, P = 0.1205. a, c, Data are presented as 
mean + s.d. (nm = 3). a-c, Data are representative of at least three independent 
experiments. 
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dinucleotide sensing and DNA sensing can be uncoupled, suggesting 
that these two pathways are discrete but share STING as a common 
signalling molecule. It is unexpected that STING would function both 
as a direct immunosensor (of cyclic dinucleotides) and as a signalling 
adaptor (in the response to DNA). One possibility is that STING 
initially evolved as a cyclic dinucleotide sensor and was subsequently 
co-opted for DNA sensing. 

We previously used mouse mutagenesis to identify STING as an 
essential molecule in the in vivo IFN response to cyclic dinucleotides". 
The requirement for STING can now be rationalized by our proposal 
that STING functions as a direct sensor of cyclic dinucleotides. 
Interestingly, STING does not share homology with any known 
immunosensor and therefore seems to represent a novel category of 
microbial detector. Although a BLAST search of the mouse proteome 
for homologues of the L. monocytogenes diadenylate cyclase (Imo2120; 
also known as DacA) identifies STING as the top hit, the homology is 
limited to a short region of the STING CTD (amino acids 311-358). 
STING does not seem to share homology with PilZ-domain-containing 
proteins, which function as c-di-GMP receptors in bacteria®. Structural 
studies are required to better characterize the interaction of STING with 
cyclic dinucleotides and to determine whether STING resembles any 
known protein in mammals or bacteria. 

Numerous studies have demonstrated that cyclic dinucleotides are 
potent immunostimulatory compounds that may be valuable as novel 
immunotherapeutics or adjuvants”’*. The therapeutic development of 
cyclic dinucleotides will be greatly facilitated by an improved under- 
standing of the mechanism by which they are sensed. Furthermore, 
our finding that STING is a direct detector of cyclic dinucleotides 
provides insight into the fundamental mechanisms by which the 
innate immune system can detect bacterial infection. 


METHODS SUMMARY 

Transfections. Transfections were carried out using Lipofectamine 2000 
(Invitrogen) according to the manufacturer’s instructions. Digitonin permeabili- 
zation was used to introduce c-di-AMP into cells as described previously’”. 
Recombinant STING. DNA encoding the CTD of mouse STING (nucleotides 
414-1,137) was cloned into the vector pET28a for recombinant protein expression 
in Escherichia coli. 

Ultraviolet radiation crosslinking. c-di-[??P]GMP was enzymatically synthesized 
using recombinant WspR and was used in an ultraviolet radiation crosslinking 
assay as described previously’. Briefly, 50 jg HEK293T cell lysate at a final con- 
centration of 21g il’, or 1 pg recombinant His-tagged STING, was incubated 
with 2 Ci c-di-[?*P]GMP in binding buffer (20 mM Tris-HCl, pH7.4, 200mM 
NaCl and 1mM MgCl) for 15min at 25°C. The reactions were irradiated at 
254 nm, and the proteins in the samples were then separated by SDS-PAGE. 
Statistical analysis. Statistical differences were calculated with an unpaired two- 
tailed Student’s t-test using Prism 5.0b software (GraphPad). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Synthesis of c-di- [°?P]GMP. The synthesis of c-di-[??P]GMP was carried out as 
described previously”’. Briefly, recombinant His,-tagged WspR was incubated 
with [o-**P]GTP (3,000Cimmol ', 10pCipl~', Amersham Biosciences) for 
2h at 25°C, followed by heat inactivation of WspR at 95 °C for 5 min. Residual 
*°p was removed by incubation with calf intestinal phosphatase (CIP) (New 
England Biolabs) for 10min at 37°C. CIP was heat inactivated at 95°C for 
5 min, followed by centrifugation at 16,000g for 5 min. The [*P]GTP used as a 
negative control was prepared identically except that His-tagged WspR and CIP 
were omitted from the preparation. Radiolabelled nucleotides were quantified by 
separation by thin-layer chromatography on cellulose-PEI plates (Macherey- 
Nagel) using 1.5 M KH,PO,, pH 3.65 (Supplementary Fig. 5). 

Cell lines and animals. C57BL/6 Myd88 ‘~ Trif ‘~ mice were obtained from G. 
Barton, and immortalized macrophages were generated as described previously”. 
Immortalized bone marrow macrophages were maintained in RPMI 1640 
(Invitrogen) supplemented with 10% FBS, penicillin-streptomycin and glutamine. 
HEK293T cells were maintained in DMEM supplemented with 10% FBS, penicillin- 
streptomycin and glutamine. Animal use was approved by the Animal Care and Use 
Committee at the University of California, Berkeley. 

Plasmids. A construct encoding RocR (NP_252636) from P. aeruginosa’? was a gift 
from S. Lory. rocR was cloned into the MSCV2.2 retroviral expression construct 
upstream of an internal ribosome entry site (IRES)-green fluorescent protein 
(GEP). MSCV-rocR was transduced into immortalized macrophages from 
Myds88"' Trif ‘~ mice, and cells were sorted for GFP expression. Mouse Sting 
and the gt (I199N) mutant allele of Sting were cloned into the vector pcDNA3 with 
a C-terminal HA tag as described previously''. DNA encoding the CTD of mouse 
Sting (nucleotides 414-1,137) was cloned into pET28a for recombinant protein 
expression in Escherichia coli. 

Site-directed mutagenesis. Mutations in Sting were generated using the 
QuikChange Site-Directed Mutagenesis Kit (Stratagene) according to the manu- 
facturer’s guidelines. 

Reagents. c-di-GMP was synthesized as described previously™*’. Purified c-di-AMP 
was a gift from J. Woodward and D. Portnoy. Poly(dAT:dTA), GTP, ATP, GMP and 
guanosine were obtained from Sigma-Aldrich. Poly(I)*poly(C) (denoted poly(I:C)) 
was purchased from Invivogen. Guanosine-3’,5’-bisdiphosphate (ppGpp) was 
obtained from TriLink Biotechnologies. Sendai virus was purchased from Charles 
River Laboratories. Theiler’s murine encephalomyelitis virus (TMEV) strain GDVII 
was provided by M. Brahic and E. Freundt. DNA oligonucleotides corresponding to 
the VV70mer and ISD were purchased from Elim Biopharmaceuticals and were 
annealed as described previously~”®. 

Cell stimulation. All transfections (excluding for c-di-AMP) were carried out 
using Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instruc- 
tions. The VV70mer was transfected at a final concentration of 0.5 gm. 
Poly(dAT:dTA), poly(I:C) and c-di-GMP were transfected at a final concentration 
of 41g ml |. c-di-AMP was used at a final concentration of 5.4mM, and stimu- 
lation was performed using digitonin permeabilization as described previously’’. 
Luciferase assay. HEK293T cells were plated in TC-treated 96-well plates at 
0.5 X 10° cells ml}. The next day, the cells were transfected as indicated, together 
with IFN-B-firefly luciferase and TK-Renilla luciferase reporter constructs. 
Following stimulation for 6h with the indicated ligands, the cells were lysed in 
passive lysis buffer (Promega) for 5 min at 25 °C. The cell lysates were incubated 
with firefly luciferase substrate (Biosynth) and the Renilla luciferase substrate 
coelenterazine (Biotium), and luminescence was measured on a SpectraMax L 
microplate reader (Molecular Devices). The relative Ifnb expression was calculated 
as firefly luminescence relative to Renilla luminescence. 

Quantitative PCR. The analysis of Ifnb expression by bone marrow macrophages 
was conducted as described previously’’. 

Preparation of HEK293T cell lysates and immunoprecipitations. HEK293T 
cells were plated at a density of 1 X 10° cells well” ' in a 6-well plate. The following 
day, the cells were transfected with pcDNA3 or pcDNA3 expressing HA-tagged 
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wild-type or mutant STING using Lipofectamine 2000 (Invitrogen). The day after 
that, the cells were rinsed once with PBS and transferred to Eppendorf tubes in PBS 
containing 1 mM EDTA. The cells were pelleted briefly by centrifugation at 1,000g 
at 4°C. The cell pellet was lysed in an equal volume of digitonin lysis buffer (0.5% 
digitonin, 20 mM Tris-HCl, pH 7.4, and 150 mM NaC)l) containing protease inhi- 
bitors (Roche) for 10 min on ice. The cell lysates were centrifuged at 10,000g for 
10 min at 4°C. The protein concentration in the resultant supernatant was mea- 
sured using the Bradford reagent (Bio-Rad).The cell lysates were subjected to a 
c-di-GMP binding (crosslinking) assay (see below). The lysates were separated by 
SDS-PAGE, and the separated proteins were transferred to a nitrocellulose mem- 
brane, which was then probed with rat anti-HA antibodies (Roche), to confirm 
STING-HA expression, and mouse anti-B-actin antibodies (Santa Cruz 
Biotechnology). To immunoprecipitate HA-tagged STING, the cell lysates were 
prepared similarly in digitonin lysis buffer and incubated with anti- HA-antibody- 
conjugated agarose beads (Sigma) for 2 h at 4 °C. Washed beads were subjected to 
a c-di-GMP binding assay or separated by SDS-PAGE and stained with colloidal 
blue protein stain (Thermo Scientific). 

c-di-GMP binding assays. The c-di-GMP binding assay (also called the cross- 
linking assay) was based on a method described previously~’. Briefly, 50 1g 
HEK293T cell lysate at a final concentration of 2 1g wl-!, or 1 lig recombinant 
His,-tagged STING, was incubated with 2 \1Ci radiolabelled nucleotide in binding 
buffer (20 mM Tris-HCl, pH 7.4, 200 mM NaCl and 1 mM MgCl) for 15 min at 
25 °C. The reactions were irradiated at 254 nm for 20 min on ice at a 3-cm distance 
from a UVG-54 mineral light lamp (UVP). Immediately after crosslinking, the 
reactions were terminated by the addition of SDS sample buffer (40% glycerol, 8% 
SDS, 2% 2-mercaptoethanol, 40mM EDTA, 0.05% bromophenol blue and 
250 mM Tris-HCl, pH 6.8), boiled for 5 min and then separated by SDS-PAGE. 
The gels were dried, exposed to a phosphor screen and visualized using a Typhoon 
Trio imager (GE Healthcare). 

Protein purification. The construct expressing a constitutively active form of 
WspR (pQE-WspR*)”** was a gift from S. Lory. The purification of His,-tagged 
WspR was carried out as described previously, using Ni-NTA affinity chromato- 
graphy (Qiagen)’*. DNA encoding the CTD of mouse STING (nucleotides 414- 
1,137) was cloned into the vector pET28a and purified by Ni-NTA affinity 
chromatography (Qiagen) according to the manufacturer’s instructions. 
Equilibrium dialysis. The binding affinity of radioactive c-di-GMP was measured 
by equilibrium dialysis, using a 96-well equilibrium dialyser (Harvard Apparatus) 
with a 5,000 molecular weight cut-off membrane. One chamber contained 150 pl 
10M purified Hisg-tagged STING(138-378) in assay buffer (25 mM Tris-HCl, 
pH7.4, 100 mM NaCl, 1 mM MgCl, and 10% glycerol), while the other was filled 
with 150 pl c-di-[?’P]GMP at a range of concentrations (40-160 j.M). Equilibrium 
was reached after 48 h at 25 °C, and three samples were drawn from each chamber 
and mixed with 2 ml Econo-Safe scintillation fluid. Samples were measured in an 
LS 6000 IC scintillation counter (Beckman). Data analysis was performed using 
Prism 5.0b software (GraphPad). The dissociation constant (Ky), the maximum 
number of binding sites (B,,.x) and the Hill coefficient (h) were generated using 
nonlinear regression, allowing one-site specific binding with a Hill slope. 
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Temporal dynamics and genetic control of 
transcription in the human prefrontal cortex 


Carlo Colantuoni>?"**, Barbara K. Lipska*, Tianzhang Ye, Thomas M. Hydel4, Ran Taol, J effrey T. Leek?, 
Elizabeth A. Colantuoni*, Abdel G. Elkahloun’, Mary M. Herman', Daniel R. Weinberger'* & Joel E. Kleinman! 


Previous investigations have combined transcriptional and genetic 
analyses in human cell lines’*, but few have applied these tech- 
niques to human neural tissue**. To gain a global molecular per- 
spective on the role of the human genome in cortical development, 
function and ageing, we explore the temporal dynamics and genetic 
control of transcription in human prefrontal cortex in an extensive 
series of post-mortem brains from fetal development through 
ageing. We discover a wave of gene expression changes occurring 
during fetal development which are reversed in early postnatal life. 
One half-century later in life, this pattern of reversals is mirrored in 
ageing and in neurodegeneration. Although we identify thousands 
of robust associations of individual genetic polymorphisms with 
gene expression, we also demonstrate that there is no association 
between the total extent of genetic differences between subjects and 
the global similarity of their transcriptional profiles. Hence, the 
human genome produces a consistent molecular architecture in 
the prefrontal cortex, despite millions of genetic differences across 
individuals and races. To enable further discovery, this entire data 
set is freely available (from Gene Expression Omnibus: accession 
GSE30272; and dbGaP: accession phs000417.v1.p1) and can also be 
interrogated via a biologist-friendly stand-alone application (http:// 
www.libd.org/braincloud). 

The temporal dynamics of genome expression throughout the body 
and its genetic and epigenetic control are central to a synthetic under- 
standing of how a relatively small number of DNA molecules can give 
rise to an entire human. Similarly, temporal expression patterns in 
neural tissue and their regulation across the lifespan will elucidate 
molecular mechanisms involved in the formation, mature function 
and degeneration of the human brain. 

Previous studies have combined transcriptome and genetic analyses 
to investigate the genetic control of gene expression in human cell 
lines’*. Few studies have applied these genomic techniques to human 
neural tissue*’” or human brain disease®. Others have focused on the 
transcriptome in human fetal brain tissue’, temporal patterns of gene 
expression in postnatal life, and gene co-expression patterns in the 
brain''’’. Here we describe the combination of genome-wide DNA 
and RNA analyses in a large collection of meticulously curated human 
brain specimens to produce a comprehensive view of how the expres- 
sion of the human genome in the prefrontal cortex (PFC) progresses 
from fetal development through ageing and how sequence variation in 
the genome impacts on these expression patterns. 

The post-mortem brain tissue collection (n = 269 subjects without 
neuropathological or neuropsychiatric diagnosis) spans the majority 
of the human lifespan (Fig. la, b). From each subject in the brain 
collection, RNA from PFC grey matter was analysed using spotted 
oligonucleotide microarrays yielding data from 30,176 gene expression 
probes. DNA from cerebellar tissue was studied with Illumina 
BeadChips producing 625,439 single nucleotide polymorphism 
(SNP) genotypes for each subject. 


The absolute rate of expression change within each life stage was 
quantified for all genes using linear models (Fig. 1b, box plot). The rate 
of expression change during fetal development is much faster than at 
any other stage in human life. Changes during infancy are much 
slower, yet still more rapid than at any later time in life. After the first 
half year of postnatal life, rates of expression change slow markedly, 
and continue to slow during the childhood and teenage years, sub- 
sequently maintaining a low rate of change through the 20s, 30s and 
40s. After this period, rates of expression change begin to rise again 
through several decades, and in the aged human brain, change reaches 
and then exceeds rates observed during teenage years. 

The distribution of expression trajectory turning points was inves- 
tigated across postnatal life (Fig. 1b, grey histogram). Rates of expres- 
sion change decrease from childhood through the teenage years (blue 
boxes) as many genes redirect expression trajectories (peak in grey 
histogram near 20 years). In contrast, in ageing, expression change 
accelerates (yellow-orange boxes) as more genes enter turning points 
(the minor peak in the grey histogram near 60 years). 

The correlation of expression measures across subjects was explored 
within each age stage and between adjacent stages (Fig. 1b, points). 
Transcription in PFC appears most similar across individuals at the 
beginning of life and then again to a lesser extent nearer its end, 
demonstrating the most diversity during the years of mature brain 
functioning, when age-dependent rates of expression change are 
lowest (this observation is also clear in Fig. 1c). The separation of 
mean within- and between-age stage correlations observed early in 
life indicates the occurrence of fundamentally distinct transcriptional 
programs within fetal, infant and childhood development, followed by 
a smoother more continuous progression of change throughout the 
rest of the lifetime. 

To obtain a global perspective on transcription in PFC across the 
human lifetime, expression profile correlations were combined with 
multidimensional scaling (MDS) to reduce the complexity of the 
expression data and produce an intuitive visualization of global pat- 
terns (Fig. 1c). The spatial progression of the colour scale in this plot is 
a reflection of age-dependent change in human PFC transcription. 
Even within the brief 6-week range of fetal development examined, 
there is clearly observable systematic expression change with time 
(along the vertical axis). Following fetal development, the path of 
global transcriptional change alters markedly, progressing steadily 
away from the fetal state through the neonatal, infant and childhood 
ages, each of which has a relatively distinct identity compared with 
other periods (across the horizontal axis). A second redirection of 
global transcriptional change occurs at the end of the teenage years 
(also observed from a different perspective in Fig. 1b, grey histogram), 
followed by a more linear progression through adulthood and into 
ageing. This global view was also used to inspect the effects of covariates 
(Supplementary Fig. 1). 
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Expression change, turning points and correlation across the lifespan 
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Figure 1 | A global view of the PFC transcriptome. a, Histogram of subject 
ages in the brain collection. Birth is indicated by the dashed white line. This 
colour scale serves as the legend for all panels in Fig. 1 and Fig. 2b. b, Box plot of 
absolute rates of expression change within each stage of life. Because the rates of 
expression change are so high early in life, the y-axis scale is different for fetal 
and infant stages than for all other stages. The two horizontal dotted lines in the 
left panel show the entire extent of the y axis in the right panel. Only microarray 
probes showing systematic variation with age (R’ > 0.5) were included 

(n = 8,704 probes). Age ranges: fetal, 14-20 gestational weeks; infant, 

0-6 months; child, 1-10 years; decades as labelled. Open points depict the mean 
expression correlation across subjects within each age stage (Pearson’s r 
calculated across all expression measures; y axis scale at far right). Filled points 


In another global view of prefrontal transcription, the age effect 
within the fetal samples is effectively illustrated using principal com- 
ponents analysis (PCA, Fig. 1d). The first principal component (PC1) 
separates the fetal from postnatal samples, whereas the second (PC2) 
appears to align with age effects within both the fetal and postnatal 
samples. The directions of the fetal and postnatal age effects along PC2 
appear to be in opposition. Additionally, fetal expression changes are 
negatively correlated with those in other stages of early life: infancy 
r=—0.45, P=1.3X10 ”° childhood r= —0.48, P=1.5 X 10°; 
and teenage years r= —0.18, P= 2.3 X 10 * (including only probes 
with slopes at P< 0.05 in both stages, Supplementary Table 1). This 
might indicate that select fetal expression changes are reversed at 
different times across the lifespan, beginning immediately after birth. 

To investigate further this observation of reversing trajectories, 
genes showing significant expression change across age in both fetal 
and infant development were compared directly (Fig. 2a, b). 
Approximately three-quarters of genes showing significant change 
in both stages reverse their direction of expression change between 
fetal and early postnatal life, with most changing from an increase in 
utero to a decrease in the months after birth. 

To gain functional insight into these changing expression patterns, 
the genes within each of the quadrants in Fig. 2a were interrogated for 
the over-representation of functional gene groups. Detailed functional 
group lists for each of the quadrants are contained in Supplementary 
Table 3. This examination of gene expression trajectories in early life 
may give a global genomic perspective on mechanisms in neural 
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PC1: 44.3% variance 


depict the mean expression correlation across subjects between adjacent age 
stages. The grey histogram displays the distribution of ages that marked a 
change in the trajectory of expression for genes across the postnatal lifespan 
(see Methods). c, A global view of dynamic PFC transcription across the 
lifespan, using MDS (distance = 1 — 1, stress = 20.5) and expression 
correlation. Each mRNA sample is represented as a single point coloured by the 
age of the subject. Pairwise distances are derived from correlation across all 
expression measures between subjects, such that proximity indicates similarity, 
whereas distance indicates dissimilarity. d, PCA of transcription in PFC across 
the lifespan. PCA was performed on data from all gene expression probes to 
represent each mRNA sample as a single point coloured by the age of the 
subject. 


development that have been well studied at the individual cell and gene 
level: genes involved in cell division are over-represented among genes 
for which expression decreases during both fetal development and 
infancy. Inversely, genes related to the synapse are over-represented 
among genes showing expression increases during both stages. This 
pair of findings is a genomic reflection of the well-characterized 
decrease in cell proliferation with opponent increase in neuronal dif- 
ferentiation through both late fetal and early infant development. 

In contrast to synaptic components, genes with axonal function are 
highly enriched among genes showing increasing expression during 
fetal development followed by decreases after birth. This coordinated 
reversal of expression trajectories among axonal genes while many 
synaptic genes continue to increase in infancy is probably a genomic 
view of the process of pruning exuberant axons while synapse develop- 
ment and maturation at appropriate target sites advance’’. Specific 
gene expression changes in synaptic and axonal genes during fetal 
and infant life are listed in Supplementary Table 4. 

Genes in ATP synthesis also show a reversal of expression patterns, 
but in this case, decreasing during fetal development and rising after 
birth. In fetal development, energy metabolism seems to be slowing 
along with the decrease in cellular proliferation, consistent with cell 
division as the primary energy consuming process during fetal develop- 
ment. However, after birth, proliferation in the PFC continues to slow 
while expression of energy metabolism genes increases markedly. 
Other functional gene groups with increasing expression during these 
first postnatal months include genes involved in Ca** binding, Ca** 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


a c ; 
n genes = 432 (28.8%) n genes = 195 (13%) ‘Among genes showing reversal; 432 genes down in fetal 
6] P=8.5x 104 P=7.7x 107? g -in fetal and infant stages: ‘development and up in infancy! 
r=0.29 H 
ATP synthesis * | N ei Synpase P<2.2x10% o ee = . 


P=2.2x10° * P=2.0x 104 


zm a) 
& 4 8 44 
o _ © 
D oD 
& 24 S 24 
ie ae. 
oO oO 
2 a ee 
g Axon g 
© 5.8x 108 © 
2-24 ° 24 
) . o 
€ . 73 € 
£ -4+ @ -44 
£ Cell cycle * *  e nfiR-9 targets £ 
P=7.2x 10-22 e 4 - °  P=5.5x%10% 673 genes upinfetal “* | 
—6- n genes = 202 (13.4%) a genes = 673 (44.8%) —6 || development and down in infancy . 
P=6.4x10°5 ' . P=44x107 ey 
T T T T T T T T T T 
-40 -20 0 20 40 -0.10 -0.05 0.0 0.05 0.10 


Fetal expression change (log,, i.e., doublings per year) 


oD é : 
6-15 --1.5 -44 
= 
5 20° 5 ; +-2.0 
Q 516 eo" T T m8 1.04 
ao 440 CCNB2 14 
3 %o 3 
® | 0.54 
2 2 
1 e t1 
0 °¢ . C) +O 0.04 
ry % 
4 8 L-1 
-2 e 08 -2 
See “08 
14 200 0.51 10 20 80 
Fetal Infant Child Teen Adult 
Gw) vy) wv) (yn (yr) 


Age (variable scale) 


Figure 2 | Reversal of fetal expression changes in infancy and ageing. 

a, Scatter plot of fetal and infant gene expression change with age. Each gene is 
represented as a single point. Only genes with slopes at P < 0.05 in both stages 
were included in this analysis (n = 1,502 genes measured by 1,819 probes: 
Supplementary Table 2). The number of genes in each quadrant is indicated in 
black. The P-values listed were derived from Pearson’s x” tests comparing the 
proportion of genes in each quadrant compared to an expected proportion of 
0.25. Key functional gene groups are highlighted and listed in the quadrants 
where they are over-represented. b, Depiction of individual genes’ expression 
across the human lifespan, illustrating the four patterns of expression across 
fetal and infant development shown in a. The gene depicted in each panel is an 


transport, gated ion channels, voltage-gated K* channels and active ion 
transport (Supplementary Table 3), indicating that neuronal matura- 
tion and activity now drive energy production. 

This functional analysis of expression trajectories also reveals poten- 
tially novel mechanisms in early cortical development: in the heavily 
populated quadrant showing increasing expression in the fetus and 
decreasing expression in infancy, 22 of the top 49 over-represented 
gene groups are microRNA (miRNA) target gene groups (Supplemen- 
tary Table 3, P= 6.5 X 10 >and below). Together, these miRNA target 
groups account for 266 of the 673 genes in this quadrant (40%). miR-9 
targets are the most highly enriched of these miRNA target gene 
groups. miR-9 is brain-specific'* and is used reiteratively in diverse 
processes in neural development, including patterning, neurogenesis 
and differentiation’*', as well as cell migration”. 

The reversal of fetal expression trajectories is also seen much later in 
life. Fetal expression trajectories show a strong negative correlation 
with changes observed in the sixth decade of life (50s) (r= —0.46, 
P=24%X10 *'; Supplementary Table 1). This finding is consistent 
with the age-dependent repression of neuronal genes observed previ- 
ously'®. Whereas fetal expression trajectories show negative correla- 
tion with both infant and 50s trajectories, expression trajectories in 
infancy do not correlate with those observed in the 50s (Supplemen- 
tary Table 1). However, within the set of genes showing trajectory 
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example drawn from gene groups highlighted in the corresponding quadrants 
in a. Expression levels (normalized log,(sample/reference)) are on the y axis, 
with age on the x axis. The linear age scale is variable within each of the 
individual stages of life as labelled. Subjects are coloured by age as in Fig. la. 
GW, gestational week. Clockwise from top left: NADH dehydrogenase 
(ubiquinone) 1a subcomplex 7, 14.5 kDa (NDUFA7; ATP synthesis); 
y-aminobutyric acid A receptor, «1 (GABRAI; synapse); stathmin-like 2 
(STMN2; axon); and cyclin B2 (CCNB2; cell cycle). c, Visualization of ageing 
(50s) and infant expression trajectories among genes showing reversal between 
fetal development and infancy (same genes depicted in a, lower right and upper 
left panels, n = 1,105). The red line depicts a linear fit to the data. 


reversal between fetal and infant ages, expression change in infancy 
and in the 50s share a striking amount of similarity (Fig. 2c). Therefore, 
although infant expression changes do not globally resemble those 
happening later in life, the specific reversal of fetal expression trajectories 
seen in infancy is mirrored within changes in ageing. 

These fetal reversals in ageing can also be demonstrated by com- 
paring our observations in fetal development with recent findings in 
ageing. Genes with significant increases during fetal development are 
enriched for genes shown to decrease in the ageing cortex'’, whereas 
genes decreasing during fetal development are enriched for genes 
known to increase in ageing (P= 1.0 X10 ° and P=4.6X 10"'" 
respectively; see Supplementary Table 5). Similar reversals are also 
seen in genes reported to change in Alzheimer’s disease”: fetal 
increases are enriched for genes downregulated in Alzheimer’s disease 
and fetal decreases are enriched for genes upregulated in Alzheimer’s 
disease (P= 2.2 X 10-*' and P=7.1 X10 ’, respectively; see Sup- 
plementary Table 5). Hence, in the PFC, the reversal of specific 
expression patterns from in utero development occurs in infancy 
and then again much later in normal ageing and in the neuropatho- 
logical processes of Alzheimer’s disease. 

To explore the genetic control of prefrontal expression patterns, 
DNA from the sample collection was interrogated with high-density 
SNP microarrays to catalogue common genomic polymorphisms. 
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All possible associations of SNP genotypes with gene expression levels 
were examined (expression quantitative trait loci, or eQTL): 
n = 30,176 expression probes X 625,439 SNP genotypes = 1.89 x 10°° 
(~19 billion) possible associations. Consistent with previous observa- 
tions, we see that individual SNPs can profoundly affect the expression 
level of individual genes. When considering data across all subjects, 
1,628 individual associations surpass genome-wide Bonferroni correc- 
tion. Association analysis was also conducted within the African 
American and Caucasian samples separately (significant associations 
for all analyses are in Supplementary Table 6). 

The strength and location of associations relative to transcriptional 
start sites (TSS) are explored in Fig. 3a. Consistent with past eQTL 
studies across many organisms, we find that effects proximal to TSSs 
are of greater average strength than associations across greater 
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Figure 3 | Genetic control of PFC gene expression. a, Position and strength 
of associations of SNP genotype with gene expression levels (distance of the 
SNP from the associated TSS is plotted on the x axis). Only gene-SNP pairs 
lying on the same chromosome are displayed here. Negative log; o(P value) is on 
the y axis. Only P values <0.0001 are included in this analysis. The genome- 
wide Bonferroni-corrected P = 0.05 is shown as a horizontal dashed grey line: 
P= 0.05/1.89 X 10!” associations = 2.6 X 10° '? (11.6 on this y-axis scale). The 
numbers of associations listed in the plot refer to those passing this genome- 
wide alpha level. Both solid blue curves depict a local nonlinear regression fit 
(loess, span = 0.5) of association strength across distance from the TSS. The 
blue fit lying close to the y axis is a fit plotted in the same scale as the plotted P 
values (y axis at left). The second blue fit is the same fit, plotted on an expanded 
y axis (y axis at right). The minimum value of this fit both upstream and 
downstream from the TSS (that is, approximately where the local estimate of 
enrichment for greater association reaches zero) is marked with a vertical blue 
dashed line. b, The most significant observed association of a single SNP with 
the expression of one gene across the lifespan (highlighted in a in green): the 
rs1045599 SNP lies within the ZSWIM7 gene on chromosome 17 (ZSWIM7, 
zinc finger, SWIM-type containing 7). Age scales are defined as in Fig. 2b. 
ZSWIM7 expression level (normalized log,(sample/reference)) is on the y axis 
and is coloured by rs1045599 genotype. 
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distances or across chromosomes (Fig. 3a). There are considerably 
more strong associations downstream (3’) from the TSS than upstream. 
This is consistent with previous observations®, and demonstrates that 
downstream polymorphisms (often within gene sequences) that impact 
on expression are stronger and/or more numerous than alterations at 
equal distances upstream (potentially in promoter or enhancer 
sequences). Additionally, expression-associated SNPs are biased 
towards positions within genes (fold enrichment = 1.61, 
P=2.9X 10°”°): Within this gene bias, both exonic and intronic loca- 
tions are over-represented, but to vastly different degrees (fold enrich- 
ment = 4.3 and 1.4,P=5.0X 10 “*and12X10 *, respectively). 

The single strongest association observed was between the expres- 
sion of the ZSWIM7 gene and SNP rs1045599, located within this same 
gene (Fig. 3a, b). This association of genotype with expression level is 
observed across all ages and races studied. Similar to this analysis, the 
freely available interactive stand-alone application that we have 
developed enables the visualization of expression data across the 
lifespan and the exploration of genetic associations for individual gene 
queries (http://www.libd/braincloud). We invite the research com- 
munity to explore this resource with their own interests. 

To explore the relationship between the genome as a whole and the 
PFC transcriptome as a whole, we compared genetic distance and 
transcriptional distance in all possible pairwise subject comparisons 
(Fig. 4). Although individual SNPs clearly have an impact on the 
expression of individual genes (Fig. 3 and Supplementary Table 6) 
globally, there is no association of genetic distance between individual 
humans with the similarity of their prefrontal transcriptional profiles 
(Fig. 4, R? = 0.002). 

This dramatic lack of association between genetic distance and 
transcriptome distance across our sample is a surprising result that 
requires further interrogation. It is possible that no association is found 
in Fig. 4 because most of the genetic polymorphisms measured do not 
impact on gene expression. Therefore, we repeated this search for 
association by investigating global transcriptional distance across a 
focused subset of the genetic data: only SNPs involved in genome-wide 
significant SNP-expression associations were considered. This ana- 
lysis also revealed no association between focused genetic distance 
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Figure 4 | The genome produces a consistent molecular architecture in PFC. 
Global comparison of genetic and transcriptional differences between subjects. 
Each point represents a comparison of two subjects in the collection. Genetic 
distance between subjects is depicted on the x axis as the number of differing 
alleles over the portion of the genome interrogated. Transcriptional distance is 
shown on the y axis as 1 minus the correlation across all gene expression values 
from the subjects (as used in Fig. 1c). Each subject comparison is coloured to 
indicate the races (AA, African American; Cauc., Caucasian) of the two 
individuals involved in the comparison. The thick black curve is an estimate of 
the local mean (loess, span = 0.25) of transcriptional distance as it varies across 
genetic distance. The thin black curves depict fits to the residuals around this 
mean. Only African American and Caucasian sample comparisons are 
visualized here (>96% of the collection). 
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and global transcriptional distance (Supplementary Fig. 2). In addi- 
tion, these same analyses performed within individual races showed no 
association between global transcriptional distance and genetic dis- 
tance when either global or focused genetic distance was used. 

We conclude that despite the many genetic polymorphisms that 
individually can affect the expression of single genes, the human 
genome produces a consistent molecular architecture in the human 
prefrontal cortex across the lifespan. This is true across (the human) 
race. It is possible that individual genetic traits and complex combina- 
tions of traits that disrupt this architecture are selected against in the 
general population and would not appear in studies of normal human 
brain development. The clear observation of associations of individual 
genetic polymorphisms with gene expression (Fig. 3) in the absence ofa 
relationship between global genetic and transcriptome profiles (Fig. 4) 
demonstrates our ability to analyse microscale genetic effects while 
macroscale interactions remain elusive. It is perhaps useful to consider 
each individual complete genome as a grand combination of variants 
which is acted upon (in evolution and in environment) and which acts 
(in development, biological function and disease) as a whole, rather 
than individual genetic traits in isolation. Characterization of the 
higher-order interactions within this whole is a great challenge facing 
biologists today. 

By creating this freely available public resource, we hope that 
the research community can further explore this data set. This full 
data set is downloadable at http://www.ncbi.nlm.nih.gov/geo/query/ 
acc.cgi¢acc=GSE30272 (expression data) and http://www.ncbi.nlm. 
nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000417.v1.p1 
(SNP data). In addition, we have constructed a biologist-friendly 
stand-alone application designed to allow the research community 
to interrogate this resource one gene at a time (http://www.libd.org/ 
braincloud). 


METHODS SUMMARY 

Brain tissue collection. Post-mortem human brains from the NIMH Brain Tissue 
Collection in the Clinical Brain Disorders Branch (NIMH, CBDB) were obtained 
at autopsy. Additional brain tissue samples were provided by the NICHD Brain 
and Tissue Bank for Developmental Disorders (http://www.BTBank.org). Clinical 
characterization, diagnoses and macro- and microscopic neuropathological exam- 
inations, toxicological analysis, RNA extraction and quality control measures were 
performed using a standardized paradigm. Subjects with evidence of neuropatho- 
logy, drug use, alcohol abuse, or psychiatric illness were excluded. Subject demo- 
graphics and sample details are contained in Supplementary Table 7. 

RNA resources. Post-mortem PFC grey matter tissue homogenates were obtained 
from all subjects. Total RNA was extracted, amplified and fluorescently labelled. 
Reference RNA was pooled from all samples and treated identically to sample 
RNAs. Labelled RNAs were hybridized to two-colour custom-spotted arrays from 
the NHGRI microarray core facility. After normalization’, log, intensity ratios 
were further adjusted to reduce the impact of known and unknown sources of 
systematic noise on gene expression measures using surrogate variable analysis” 
(SVA). Validation of microarray expression patterns was performed by Taqman 
qPCR (Supplementary Table 8). In this study of RNA derived from tissue homo- 
genates, differential gene expression within a population of cells stable in cell type 
is indistinguishable from a change in the abundance of cell types that express 
different genes. There is no doubt that both phenomena contribute to signals 
measured here in the prefrontal cortex. 

DNA resources. DNA for genotyping was obtained from the cerebella of samples 
in the collection and applied to Iumina BeadArrays. Genotypes were called using 
BeadExpress software. 

Functional gene groupings. To generate functional gene groups for the analysis 
described in Fig. 2a and the text, microarray probes were annotated with data from 
numerous public, online sources. Enrichment of functional gene groups within 
various gene lists as described in the text was assessed by a standard hypergeo- 
metric test. 

SNP-expression associations. SNP-expression associations were carried out 
using linear models that included surrogate variables, age, life stage (as defined 
in Fig. 1b), an interaction of age and life stage, sex, race and the SNP under 
investigation. SNP was included as a continuous variable. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Human post-mortem brain tissue collection. The NIMH Brain Tissue 
Collection in the Clinical Brain Disorders Branch (NIMH, CBDB) obtained 
post-mortem human brains at autopsy primarily from the Offices of the Chief 
Medical Examiner of the District of Columbia, and of the Commonwealth of 
Virginia, Northern District, all with informed consent from the legal next of kin 
(protocol 90-M-0142 approved by the NIMH/NIH Institutional Review Board). 
Additional post-mortem fetal, infant, child and adolescent brain tissue samples 
were provided by the National Institute of Child Health and Human Development 
Brain and Tissue Bank for Developmental Disorders (http://www.BTBank.org) 
under contracts NO1-HD-4-3368 and NO1-HD-4-3383. The Institutional Review 
Board of the University of Maryland at Baltimore and the State of Maryland 
approved the protocol, and the tissue was donated to the NIMH under the terms 
of a Material Transfer Agreement. Clinical characterization, diagnoses, and 
macro- and microscopic neuropathological examinations were performed on all 
CBDB cases using a standardized paradigm. Details of tissue acquisition, handling, 
processing, dissection, clinical characterization, diagnoses, neuropathological 
examinations, RNA extraction and quality control measures were described previ- 
ously”’. The Brain and Tissue Bank cases were handled in a similar fashion (http:// 
medschool.umaryland.edu/BTBank/ProtocolMethods.html). Toxicological ana- 
lysis was performed on every case. Subjects with evidence of macro- or microscopic 
neuropathology, drug use, alcohol abuse, or psychiatric illness were excluded. 
Subject demographics and sample details are contained in Supplementary Table 7. 
RNA resources and analysis. Post-mortem tissue homogenates of PFC grey 
matter (DLPFC, that is, BA46/9 in postnatal samples and the corresponding region 
of PFC in fetal samples) were obtained from all subjects (n = 269 after all exclusion 
criteria). Total RNA was extracted from ~100 mg of tissue using the RNeasy kit 
(Qiagen) according to the manufacturer’s protocol. Samples with RNA integrity 
number (RIN) <5 were excluded. 500 ng of each total RNA sample was reverse 
transcribed with an oligo dT-T7 and amplified (T7) using the Ambion 
MessageAmp II kit (catalogue no. 1753, Ambion). The generated aminoallyl 
UTP-labelled antisense RNAs (aRNAs) were then coupled with Cy3 mono 
NHS ester CyDye from GE Healthcare. Reference RNA was pooled from all 
samples and was treated identically to sample RNAs, but was labelled with the 
Cy5 fluorescent dye. Two-colour custom-spotted oligonucleotide microarrays 
from the NHGRI microarray core facility using the Illumina Oligoset 
(HEEBO7) of 49,152 70-mer probes were used. After purification, the labelled 
aRNAs are hybridized overnight to the oligo arrays in 5X SSC, 25% formamide 
and 0.2% SDS buffer at 45°C using Maui Mixer FL hybridization chambers 
(BioMicro Systems). The slides are then washed at room temperature in a series 
of SSC/SDS buffers and dried by centrifugation. A laser confocal scanner (Agilent 
Technologies) was used to scan the hybridized microarrays. DeArray software 
(Scanalytics, Inc.) was used to export intensity data. Probes that were non-human, 
nonspecific (that is, mapped to >1 expressed sequence), incorrectly annotated, or 
probes containing polymorphisms with minor allele frequency > 0.01 according 
to HapMap in either YRI or CEU populations were removed from the analysis. 
Intensities below an empirically determined low intensity cutoff of 5.3 on the log, 
scale were dropped from the data. Probes with fewer than half of the fetal or 
postnatal data points remaining after this step were removed. Additionally, outliers 
defined as >6 mean average deviations from the age-appropriate linear fit were 
removed. The total number of probes remaining was 30,176. After background 
correction on the linear scale, log, ratios (sample/reference) were normalized 
across mean log, florescent intensities using loess correction”’. Missing data in 
the gene expression data matrix were imputed at this stage to enable both SVA and 
PCA. After normalization, log, ratios were further adjusted to reduce the impact of 
known and unknown sources of systematic noise on gene expression measures 
using SVA”. Two surrogate variables were generated and used to adjust log, ratios 
in all subsequent linear models. Correlation between the naively created surrogate 
variables and known sources of noise were evident: SV1+ RIN: r= 0.37, 
P=47 X10! $V2+ ArrayBatch: r = 0.73; P<2 X 107 '°. All of these micro- 
array data analyses were conducted using custom code and tools from the 
Bioconductor project (http://www.bioconductor.org/) in the R statistical language 
(http://www.r-project.org/). Validation of microarray expression patterns was 
performed by Taqman qPCR (Supplementary Table 8). 

DNA resources and analysis. DNA for genotyping was obtained from the cerebella 
of 266 of the total 269 samples in the collection and applied to either lumina 
Infinium II 650K or Illumina Infinium HD Gemini 1M Duo BeadChips according 
to manufacturer’s protocols. Only genotypes common to both platforms are 


analysed here. Genotypes were called using BeadExpress software. SNPs were 
removed if the call rate was <98% (mean call rate for this study >99%), if not in 
Hardy-Weinberg equilibrium (P<0.001) within Caucasian and within African 
American races separately, or not polymorphic (MAF <0.01). The total number 
of SNPs remaining in the analysis was 625,439 (96.2%). 

Expression turning points. For each probe, a linear-spline model of expression 
across age was fit with a single change point. The change point was allowed to vary 
across the entire age range, and the change point that produced the lowest mean- 
squared error was selected as the expression turning point for that probe. Data 
within the first and last decade of the range interrogated (0-10 and 70-80 years) 
were excluded to avoid edge effects (n = 7,272 probes). See Fig. 1b (grey histogram). 
MDS using expression correlation as a distance metric. This representation (see 
Fig. 1c) was generated using 1 — r as a distance metric, where r is the pairwise 
Pearson’s correlation coefficient calculated across all gene expression probes for 
each pair of samples. These distances were coupled with an MDS algorithm to 
attempt to satisfy all the pairwise distances in two-dimensional space. For both 
MDS and PCA, three-dimensional analyses more precisely depict systematic age 
effects (Supplementary Fig. 1, parts 2 and 3). It is important to note in this analysis 
(and those shown in Figs 1b and 4) that because expression data are expressed as a 
ratio to reference here, the mean expression correlation across all samples is near 
zero: r = 0.02. 

Functional gene groupings. To generate functional gene groups for the analysis 
described in Fig. 2a and the text, we annotated all probes with data from Kyoto 
Encyclopedia of Genes and Genomes Pathways (http://www.genome.jp/kegg), the 
Gene Ontology project (http://www.geneontology.org), the Pfam database (http:// 
www.sanger.ac.uk/Software/Pfam), mouse knockout phenotypes and human disease 
phenotypes collected by Kevin Becker’s group at the National Institute on Ageing”*”, 
the GSA project at Stanford (http://www-stat.stanford.edu/~tibs/GSA) and the 
GSEA project at the Broad Institute (http://www.broad.mit.edu/gsea), the HPRD 
project (http://www.hprd.org), as well as many groups collected from diverse sources 
at NCBI (http://www.ncbi.nlm.nih.gov), including protein-protein interactions and 
miRNA binding motifs**”’. Compilation of functional information from all of these 
sources and considering only gene groups of size 3-1,000 resulted in 23,810 partially 
redundant and overlapping functionally related gene groups. Enrichment of func- 
tional gene groups within various gene lists as described in the text was assessed by a 
standard hypergeometric test. 

During the exploration of data for Fig. 2, it was observed that there is significant 
correlation between the age and sex variables within the first 6 months of life 
(r= 0.5). To ensure that this correlation was not responsible for the discoveries 
in Fig. 2, the entire analysis was repeated while adjusting expression measures for 
sex. All the findings detailed in Fig. 2 were replicated in this verification analysis. 
Calculation of SNP-expression associations. SNP-expression associations 
referred to in Fig. 3 and Supplementary Table 6 were carried out using linear 
models that included surrogate variables generated as described above, age, life 
stage (as defined in Fig. 1b), an interaction of age and life stage, sex, race, and the 
SNP under investigation. The SNP was included as a continuous variable, coded as 
1,2,3; that is, an additive or ‘dosage’ model rather than a categorical or ‘co- 
dominant’ model was used. Association methods used to generate hits for the 
genetic distance used in Supplementary Fig. 2, part 1, were identical to those 
described above except for the omission of the race and sex terms in the linear 
model. To confirm that this analytical framework is capable of discovering asso- 
ciations between genetic and transcriptional metrics as analysed here, we con- 
ducted a positive control analysis (Supplementary Fig. 2, part 2; negative control 
also included). 
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Epigenetic pathways can regulate gene expression by controlling 
and interpreting chromatin modifications. Cancer cells are charac- 
terized by altered epigenetic landscapes, and commonly exploit the 
chromatin regulatory machinery to enforce oncogenic gene expres- 
sion programs’. Although chromatin alterations are, in principle, 
reversible and often amenable to drug intervention, the promise of 
targeting such pathways therapeutically has been limited by an 
incomplete understanding of cancer-specific dependencies on epi- 
genetic regulators. Here we describe a non-biased approach to 
probe epigenetic vulnerabilities in acute myeloid leukaemia 
(AML), an aggressive haematopoietic malignancy that is often 
associated with aberrant chromatin states”. By screening a custom 
library of small hairpin RNAs (shRNAs) targeting known chro- 
matin regulators in a genetically defined AML mouse model, we 
identify the protein bromodomain-containing 4 (Brd4) as being 
critically required for disease maintenance. Suppression of Brd4 
using shRNAs or the small-molecule inhibitor JQ1 led to robust 
antileukaemic effects in vitro and in vivo, accompanied by terminal 
myeloid differentiation and elimination of leukaemia stem cells. 
Similar sensitivities were observed in a variety of human AML cell 
lines and primary patient samples, revealing that JQ1 has broad 
activity in diverse AML subtypes. The effects of Brd4 suppression 
are, at least in part, due to its role in sustaining Myc expression to 
promote aberrant self-renewal, which implicates JQ1 as a phar- 
macological means to suppress MYC in cancer. Our results establish 
small-molecule inhibition of Brd4 as a promising therapeutic 
strategy in AML and, potentially, other cancers, and highlight the 
utility of RNA interference (RNAi) screening for revealing epige- 
netic vulnerabilities that can be exploited for direct pharmaco- 
logical intervention. 

AML represents a paradigm for understanding how complex patterns 
of cooperating genetic and epigenetic alterations lead to tumorigenesis**. 
Although this complexity poses a challenge for the development of 
targeted therapies, diverse gene mutations in AML generally converge 
functionally to deregulate similar core cellular processes. One key event 
in AML initiation is the corruption of cell-fate programs to generate 
leukaemia stem cells that aberrantly self-renew and thereby maintain 
and propagate the disease*. Although it is incompletely understood, this 
process has been linked to changes in regulatory chromatin modifica- 
tions’. For example, common AML oncogenes such as those encoding 
the AML1-ETO and MLL fusion proteins induce self-renewal programs 
at least in part through reprogramming of epigenetic pathways®’. In 
addition, several genes encoding epigenetic regulators have been iden- 
tified as targets of somatic mutation in AML*”. Because epigenetic 


alterations induced by oncogenic stimuli are potentially reversible, 
chromatin regulators are being explored as candidate drug targets’. 

To probe epigenetic pathways required for AML maintenance sys- 
tematically, we built a custom shRNA library targeting 243 known 
chromatin regulators, including most ‘writers’, ‘readers’ and ‘erasers’ 
of epigenetic marks (Supplementary Fig. 1 and Supplementary Table 1). 
This library of 1,094 shRNAs (3-6 per gene) was constructed in 
TRMPV-Neo, a vector optimized for negative-selection RNAi screen- 
ing, and was transduced as one pool into an established Tet-on- 
competent AML mouse model driven by MLL-AF9 and Nras@!*? 
(ref. 10). After drug selection, shRNA expression was induced by addi- 
tion of doxycycline, and changes in library representation after 14 days 
of culture were monitored using deep sequencing of shRNA guide 
strands amplified from genomic DNA (Fig. la and Supplementary 
Fig. 2). Using the scoring criterion of more than twenty-fold depletion 
in each of two independent replicates, 177 shRNAs were strongly 
depleted. These included all eight positive-control shRNAs targeting 
essential genes (Rpal, Rpa3, Pcna and Polr2b), as wellas several shRNAs 
targeting two known MLL-AF9 cofactors (Men! and Psip1)'*’*. Genes 
for which at least two independent shRNAs scored were subjected to 
extensive one-by-one validation using an independent MLL-AF9/ 
Nras°!?? AML cell line and vector system (Supplementary Fig. 3a). 
In both the primary screen and validation stages, several shRNAs tar- 
geting Brd4 were among the most strongly depleted, identifying this 
gene as the top scorer in the screen (Fig. la and Supplementary Fig. 3b). 

Brd4 is a member of the BET family of bromodomain-containing 
proteins that bind to acetylated histones to influence transcription”. 
BRD4 is also a proto-oncogene that can be mutated via chromosomal 
translocation in a rare form of squamous-cell carcinoma”, although a 
role in leukaemia has not been described. The recent development of 
small-molecule BET bromodomain inhibitors’*"®, together with our 
screening results, prompted us to investigate the suitability of Brd4 as 
an AML drug target. Five independent Brd4 shRNAs showed a close 
correspondence between knockdown efficiency and growth inhibition, 
indicating on-target effects (Fig. 1b). Suppression of Brd4 led to cell- 
cycle arrest and apoptosis of leukaemia cells, whereas the equivalent 
knockdown in immortalized murine embryonic fibroblasts (MEFs) led 
to only modest cell-cycle inhibition without cytotoxicity (Supplemen- 
tary Fig. 4a—d). Brd4 knockdown also failed to influence the growth of 
non-transformed G1E erythroblast cells (Supplementary Fig. 4e). In 
addition, shRNAs targeting BRD4 were sufficient to induce cell-cycle 
arrest in two MLL-AF9* human AML lines (Supplementary Fig. 5). 
Together, these results indicate that Brd4 is a critical requirement in 
MLL-AF9-induced AML. 
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Figure 1 | AML growth is sensitive to Brd4 
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Next, we examined the sensitivity of leukaemia cells to JQ1, a first- 
in-class small-molecule inhibitor of BET bromodomains with highest 
affinity for the first bromodomain of Brd4 (ref. 15). Proliferation of 
mouse MLL-fusion leukaemia cells was notably sensitive to sub- 
micromolar JQ1 concentrations, as compared to proliferation of fibro- 
blasts and GIE cells (Fig. 1c), correlating with the relative impact of 
Brd4 shRNAs on proliferation of these different cell types. We also 
examined growth-inhibitory effects of JQ1 in a series of established 
human leukaemia cell lines as well as in adult and paediatric primary 
leukaemia samples. We observed broad growth-suppressive activity of 
JQ1 (ICs < 500 nM) in 13 out of 14 AML cell lines (Fig. 1d and Sup- 
plementary Fig. 6a, b) and in 12 out of 15 primary human AML 
samples, representing diverse disease subtypes (Supplementary Fig. 
7, 8). In addition, all three primary MLL-rearranged infant leukaemias 
tested were sensitive to JQ1 (Supplementary Fig. 8), whereas other 
non-AML-leukaemia and solid-tumour cell lines showed minimal 
sensitivity to the compound (Fig. 1d and Supplementary Fig. 6c). In 
all AML lines examined, JQ1 treatment triggered cell-cycle arrest and 
apoptosis, similar to the effects seen upon shRNA-mediated Brd4 
knockdown (Fig. le, f and Supplementary Figs 7-9). Together, these 
data indicate a critical requirement for Brd4 in AML proliferation that 
can be effectively inhibited using the bromodomain inhibitor JQ1. 

We next investigated the relevance of Brd4 to AML progression in 
vivo. To suppress Brd4 in established AML in mice, Tet-on-competent 
MLL-AF9/Nras®'?? leukaemia cells were transduced with TRMPV- 
Neo constructs containing Brd4 shRNAs or control shRNAs, and 
transplanted into secondary recipient mice (Supplementary Fig. 10). 
After disease onset, confirmed by bioluminescent imaging, shRNA 
expression was induced by doxycycline administration. Subsequent 
monitoring showed that Brd4 knockdown resulted in a marked delay 


JQ1 concentration (nM) 


JQ1-treated cells, calculated by measuring the 
increase in viable cell number after 72h in culture 
and fitting data to an exponential growth curve. 
Results are normalized to the proliferation rate of 
vehicle/DMSO-treated cells, set to 1 (n = 3). CML- 
BC, chronic myeloid leukaemia blast crisis; 
T-ALL, T-cell acute lymphoblastic leukaemia. 

e, f, Percentage of cells in S-phase 
(bromodeoxyuridine (BrdU) *) after JQ1 
treatment for 48 h at the indicated concentrations 
(n = 3). BrdU was pulsed for 30 min in all 
experiments shown. All error bars represent s.e.m. 


in leukaemia progression and a survival benefit (Fig. 2a-c and 
Supplementary Fig. 11). Taking advantage of the dsRed reporter linked 
to shRNA expression in the TRMPV-Neo vector’, flow-cytometry 
analysis verified that Brd4-shRNA-positive cells were depleted in the 
terminal leukaemia burden as compared to controls, indicating that 
the mice succumbed to an outgrowth of Brd4-shRNA-negative cells 
(Fig. 2d, e). Together, these data indicate that RNAi-mediated sup- 
pression of Brd4 inhibits leukaemia progression in vivo. 

To examine whether JQ1 has single-agent activity in AML, mice 
transplanted with MLL-AF9/Nras°'*? leukaemia cells were treated 
either with daily injections of JQ1 (50 mg kg ') or with vehicle. JQ1 
administration led to a marked delay in disease progression and sig- 
nificantly extended survival (Fig. 2f-h). JQ1 also showed single-agent 
activity in an intervention setting, in which treatment was initiated 
only after disease was detected by bioluminescent imaging (Fig. 2i and 
Supplementary Fig. 12). Comparable effects were observed in an inde- 
pendent AML mouse model based on expression of AML1-ETO9a 
and Nras@!”” and loss of p53 (ref. 17), which is known to be insensitive 
to conventional chemotherapy (Supplementary Fig. 13). Consistent 
with previous findings'*, JQ1 treatment was well tolerated in mice, 
with little if any impact on normal haematopoiesis (Supplementary 
Figs 14-16). Collectively, these findings demonstrate that JQ] has 
potent and leukaemia-specific effects as a single agent in vivo. 

AML is characterized by an expanded self-renewal capacity linked 
with an inability to complete terminal myeloid differentiation. 
Therefore, we next considered whether Brd4 influences the differenti- 
ation state of leukaemia cells. Both expression of Brd4 shRNA and JQ1 
treatment altered the morphology of MLL-AF9/Nras'”? leukaemia 
from myelomonocytic blasts to cells with a macrophage-like appear- 
ance (Fig. 3a, b and Supplementary Fig. 17a). Upon Brd4 inhibition, 
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Figure 2 | Brd4 is required for AML progression in vivo. a, Bioluminescent 
imaging of mice transplanted with MLL-AF9/Nras“'”? leukaemia cells 
harbouring the indicated TRMPV-Neo-shRNAs. Doxycycline was 
administered upon disease onset, 6 days after transplant. Day 0 indicates the 
time of doxycycline treatment. b, Quantification of bioluminescent imaging 
responses after doxycycline treatment. Mean values of four replicate mice are 
shown. c, Kaplan-Meier survival curves of recipient mice transplanted with the 
indicated TRMPV-Neo-shRNA leukaemia lines. The interval of doxycycline 
treatment is indicated by the arrow. Each shRNA group contained 6-8 mice. 
Statistical significance compared to shRen was calculated using a log-rank test; 
*, P= 0.0001; **, P< 0.0001. d, Representative flow cytometry plots of donor- 
derived (Cd45.2*) bone marrow cells in terminally diseased doxycycline- 
treated mice. The gate shown includes dsRed* /shRNA* cells. e, Percentage of 
dsRed*/shRNA* cells in the Cd45.2* terminal leukaemia burden. Mean values 
of four replicate mice are shown. f, Bioluminescent imaging of MLL-AF9/ 
Nras®'?? leukaemia recipient mice at the indicated day after initiation of 
treatment with JQ1 (50 mg! kg! d~') or DMSO carrier. g, Quantification of 
bioluminescent imaging responses to JQ1 treatment. Mean values of six 
DMSO- and seven JQ1-treated mice are shown. P values were calculated using a 
two-tailed Student’s t-test. h, Kaplan-Meier survival curves of control and JQ1- 
treated mice. Statistical significance was calculated using a log-rank test. In 

f, g and h, JQ1 treatment was initiated on day 1 after transplant of 50,000 
leukaemia cells. i, Quantification of bioluminescent imaging responses to JQ1 
treatment in established disease. Treatment of leukaemic mice was initiated 

6 days after transplant, when disease first became detectable by imaging. Mean 
values of six DMSO- and seven JQ1-treated mice are shown. P values were 
calculated using a two-tailed Student’s t-test. All error bars represent s.e.m. 


leukaemia cells showed increased surface expression of integrin aM 
(Itgam, also known as Mac-1), a myeloid differentiation marker, and 
decreased expression of Kit, a marker associated with leukaemia stem 
cells (LSCs) in mouse models of MLL-rearranged leukaemia (Fig. 3c, d 
and Supplementary Fig. 17b, c)'*””. In addition, JQ1 treatment induced 
morphological signs of maturation phenotypes in most of the primary 
leukaemia samples tested, albeit to varying degrees (Supplementary 
Figs 7 and 8). 
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Figure 3 | Brd4 inhibition leads to myeloid differentiation and leukaemia 
stem-cell depletion. a, b, Light microscopy of May-Griinwald/Giemsa-stained 
MLL-AF9/Nras“'”? leukaemia cells after 2 days of doxycycline-induced 
shRNA expression or 2 days of JQ1 treatment (100 nM). Expression of shRNA 
was induced in TRMPV-Neo-transduced leukaemia cells. Imaging was 
performed with a X40 objective. Representative images of three biological 
replicates are shown. c, d, Flow cytometry analysis of Mac-1 and Kit surface 
expression after 4 days of shRNA expression or 2 days of JQ1 treatment 

(100 nM). A representative experiment of three biological replicates is shown. 
e-h, GSEA plots evaluating changes in macrophage and LSC gene signatures 
upon Brd4 inhibition. In e and g, RNA for expression arrays was obtained from 
sorted dsRed*/shRNA* cells (shRen versus three different Brd4 shRNAs) after 
2 days of doxycycline induction. In fandh, microarray data were obtained from 
leukaemia cells treated for 2 days with DMSO or 100 nM JQ1. NES, normalized 
enrichment score; FDR q-val, false discovery rate q-value (the probability that a 
gene set with a given NES represents a false-positive finding). 


To further investigate whether suppression of Brd4 affects the LSC 
compartment, we performed gene set enrichment analysis (GSEA) of 
expression microarray data obtained from Brd4-shRNA-treated and 
JQ1-treated leukaemia cells”. GSEA revealed a marked upregulation of 
macrophage-specific gene expression after Brd4 inhibition (Fig. 3e, f), 
as well as global downregulation of a gene expression signature previ- 
ously shown to discriminate LSCs from non-self-renewing leukaemia 
cell populations (Fig. 3g, h)’’. A similar profile of gene expression 
changes was seen after JQ1 treatment of THP-1 cells, a human MLL- 
AF9-expressing AML cell line (Supplementary Fig. 18). Although we 
cannot exclude the involvement of additional cellular targets, the strong 
concordance between phenotypes induced by Brd4 shRNAs and JQ1 
supports Brd4 as the relevant target of JQ1 in AML. Together, these 
findings indicate that Brd4 is critically required to maintain LSC popu- 
lations and prevent terminal myeloid differentiation. 

Recent evidence indicates that the Myc transcriptional network has an 
important role in LSC self-renewal’*”*. Because previous studies also 
implicate Myc as a potential downstream target of Brd4 (refs 24, 25), 
we examined whether this regulatory function was relevant to the anti- 
leukaemic effects of JQ1. Brd4 inhibition with shRNAs or JQ] led to a 
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marked reduction of mRNA and protein levels of Myc in MLL-AF9/ 
Nras@? leukaemia, whereas these effects were minimal in MEF and 
GIE cells (Fig. 4a, b and Supplementary Fig. 19a-c). Downregulation of 
Myc mRNA levels occurred within 60 min of JQ1 exposure, qualitatively 
preceding the increased expression of genes related to macrophage dif- 
ferentiation, such as Cd74 (Fig. 4c). Supporting a direct role in Myc 
transcriptional regulation, chromatin immunoprecipitation (ChIP) 
experiments identified a region of focal Brd4 occupancy about 2 kilo- 
bases upstream of the Myc promoter and this was eliminated after 
exposure to competitive JQ1 (Fig. 4d). As expected, RNAi- or JQ1- 
induced suppression of Brd4 led to a global reduction in expression of 
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Figure 4 | JQ1 suppresses the Myc pathway in leukaemia cells. a, Quantitative 
reverse transcription PCR (qRT-PCR) of relative Myc mRNA levels in the 
indicated mouse or human cells after a 48 h treatment with indicated JQ1 dose or 
DMSO. Results were normalized to Gapdh, with the relative mRNA level in 
untreated cells set to 1 (n = 3). b, Western blotting of whole-cell lysates prepared 
from MLL-AF9/Nras“” leukaemia cells treated for 48 h with DMSO or 250 nM 
JQ1. A representative experiment of three biological replicates is shown. c, qRT- 
PCR time course at indicated time points after treatment of MLL-AF9/Nras@!?? 
leukaemia cells with 250 nM JQ1. Results were normalized to Gapdh, with the 
relative mRNA level in untreated cells set to 1 (n = 3). d, ChIP-qPCR performed 
in MLL-AF9/Nras“”” leukaemia cells with the indicated antibodies and PCR 
primer locations (n = 6 for DMSO; n = 4 for JQ1-treated). TSS, transcription start 
site. e, Western blotting of whole-cell lysates prepared from MLL-AF9/Nras*'”? 
leukaemia cells transduced with empty vector or a Myc-cDNA-containing MSCV 
retrovirus. Cells were treated for 48 h with DMSO or 250 nM JQ1. A representative 
experiment of three biological replicates is shown. f, Light microscopy of May- 
Griinwald/Giemsa-stained MLL-AF9/Nras“*? leukaemia cells transduced with 
an empty vector or with the Myc cDNA. Cells were treated for 5 days with 50 nM 
JQ1 and imaged using a X40 objective. A representative image of three biological 
replicates is shown. g, Quantification of BrdU incorporation after a 30-min pulse 
in MLL-AF9/Nras“'”” leukaemia cells transduced with empty control vector or 
the Myc cDNA. Cells were treated with JQ1 for 5 days at the indicated 
concentrations (n = 3). All error bars shown represent s.e.m. 
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Myc target genes***° (Supplementary Figs 19d and 20). Notably, JQ1 
treatment triggered Myc downregulation in a broad array of mouse 
and human leukaemia cell lines examined (Fig. 4a, b and Supplemen- 
tary Fig. 21), indicating that JQ1 may provide a means to suppress the 
Myc pathway in a range of AML subtypes. 

To evaluate whether suppression of Myc confers the growth- 
inhibitory effects of JQ1, we generated MLL-AF9/Nras°'7” leukaemia 
cultures in which the Myc cDNA was ectopically expressed from a 
retroviral promoter, which resulted in Myc expression levels that were 
only slightly elevated but entirely resistant to JQ1-induced suppression 
(Fig. 4e and Supplementary Fig. 22a). Ectopic Myc expression con- 
ferred nearly complete resistance to macrophage differentiation and 
cell-cycle arrest induced by JQ1 and Brd4 shRNAs (Fig. 4f, g and 
Supplementary Figs 22b and 23). Furthermore, global gene-expression 
profiling revealed that most of the JQl-induced transcriptional 
changes are probably secondary effects of Myc downregulation (Sup- 
plementary Fig. 24). Ectopic Myc expression was unable to prevent 
JQ1-induced cell death, indicating that Brd4 has Myc-independent 
roles in regulating cell survival (Supplementary Fig. 22c, d). These 
findings collectively support a role for Brd4 in maintaining Myc 
expression to preserve an undifferentiated cellular state in AML. 

By taking a non-biased screening approach targeting epigenetic reg- 
ulators, our study has identified Brd4 as a critical factor required for 
AML disease maintenance. Because Brd4 is not evidently mutated or 
overexpressed in AML (Supplementary Fig. 25), the exquisite sensitivity 
of leukaemia cells to Brd4 inhibition would not have been revealed 
simply through genetic or transcriptional characterization of this 
disease. We further show that the bromodomain inhibitor JQ1 has 
broad activity in diverse AML contexts and, by comparing its effects to 
those induced by Brd4 shRNAs, we provide evidence that Brd4 is the 
primary target for the antileukaemic activity of JQ1. Of note, JQ] is a 
first-generation chemical inhibitor yet to be optimized for in vivo 
delivery, with a half-life of only about 1h in rodents (Supplemen- 
tary Fig. 26 and ref. 15). The more robust antileukaemic effects seen 
using Brd4 shRNAs in vivo indicate that second-generation deriva- 
tives of this compound may have greater clinical activity. Regardless, 
our results unambiguously highlight the utility of RNAi screening for 
revealing candidate drug targets in cancer. 

As a competitive inhibitor of the acetyl-lysine binding domain, JQ1 
interferes with the ability of Brd4 to ‘read’ histone acetylation marks that 
facilitate transcriptional activation’. When applied to leukaemia cells, 
JQ1 interferes with transcriptional circuits supporting self-renewal, thus 
targeting LSCs and inducing terminal differentiation. In a parallel study, 
we identified the transcription factor Myb as a critical mediator of 
addiction to the MLL-AF9 oncogene”’. Notably, global gene-expression 
changes observed after genetic or pharmacological inhibition of Brd4 
are remarkably similar to those seen upon suppressing Myb”, indicating 
that Myb and Brd4 may intersect functionally in a common transcrip- 
tional circuit that is essential for malignant self-renewal. A key down- 
stream effector of both Myb and Brd4 is the oncoprotein Myc (ref. 27), 
which has been validated as an attractive therapeutic target but has thus 
far escaped efforts at pharmacological inhibition**”’. Although the pre- 
cise mechanism remains to be further defined, targeting Brd4 abolishes 
Myc expression and limits self-renewal, with selectivity for the malig- 
nant context, thus avoiding the haematopoietic toxicities that may be 
associated with systemic Myc inhibition’®. As such, our study may 
define a general strategy to disarm oncogenic pathways through the 
direct modulation of the epigenetic machinery. 


METHODS SUMMARY 

Pooled negative-selection RNAi screening. A customized shRNA library target- 
ing 243 chromatin-regulating mouse genes was designed using miR30-adapted 
BIOPREDsi predictions, and was generated by PCR-cloning a pool of oligonucleo- 
tides synthesized on 55k arrays (Agilent Technologies). Pools of shRNAs were 
subcloned into the TRMPV-Neo vector (Addgene catalogue no. 27990) together 
with control shRNAs, and transduced into Tet-on MLL-AF9/Nras@?? leukaemia 
cells for negative-selection screening, essentially as described previously’®. All 
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shRNA sequences as well as primary screening data are provided in 
Supplementary Table 1. 

Animal studies. All mouse experiments were approved by the Cold Spring 
Harbor animal care and use committee. For conditional RNAi experiments in 
vivo, Tet-on MLL-AF9/Nras@?? leukaemia cells were transduced with TRMPV- 
Neo-shRNA constructs, followed by transplantation into sub-lethally irradiated 
recipient mice, as described previously’®. For shRNA induction, animals were 
treated with doxycycline in both drinking water (2mgml~’ with 2% sucrose; 
Sigma-Aldrich) and food (625 mg kg, Harlan laboratories). For JQ1 treatment 
trials, a stock of 100mgml * JQ1 in DMSO was diluted 20-fold by dropwise 
addition of a 10% 2-hydroxypropyl-B-cyclodextrin carrier (Sigma) under vortex- 
ing, yielding a 5 mg ml’ final solution. Mice were intraperitoneally injected daily 
with freshly diluted JQ1 (50 or 100 mg kg’) or a similar volume of carrier con- 
taining 5% DMSO. 

Microarray analysis. Expression microarrays were performed using Affymetrix 
ST 1.0 GeneChips. Raw microarray data can be accessed at Gene Expression 
Omnibus, GSE29799. Pathway analysis was performed using GSEA v2.07 software 
with 1,000 phenotype permutations”. All gene sets used for GSEA are provided in 
Supplementary Table 2. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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Pooled negative-selection RNAi screening. A custom shRNA library targeting 
243 chromatin-regulating mouse genes was designed using miR30-adapted 
BIOPREDsi predictions (six shRNAs per gene) and constructed by PCR-cloning 
a pool of oligonucleotides synthesized on 55k customized arrays (Agilent 
Technologies) as previously described'®. After sequence verification, 1,094 
shRNAs (3-6 per gene) were combined with several positive- and negative-control 
shRNAs at equal concentrations in one pool. This pool was subcloned into 
TRMPV-Neo and transduced into Tet-on MLL-AF9/Nras@!?? leukaemia cells 
using conditions that predominantly lead to a single retroviral integration and 
represent each shRNA in a calculated number of >500 cells (a total of thirty 
million cells at infection, 2% transduction efficiency). Transduced cells were 
selected for 5 days using 1 mgml ' G418 (Invitrogen); at each passage more than 
twenty million cells were maintained to preserve library representation through- 
out the experiment. After drug selection, TO samples were obtained (~twenty 
million cells per replicate) and cells were subsequently cultured with 0.5 mg ml! 
G418 and 1pgml' doxycycline to induce shRNA expression. After 14 days 
(12 passages, T14), about fifteen million shRNA-expressing (dsRed*/Venus* ) 
cells were sorted for each replicate using a FACSAriaIIl (BD Biosciences). 
Genomic DNA from TO and T14 samples was isolated by two rounds of phenol 
extraction using PhaseLock tubes (Sprime) followed by isopropanol precipitation. 
Deep-sequencing template libraries were generated by PCR amplification of 
shRNA guide strands as previously described’’. Libraries were analysed on an 
Illumina Genome Analyser at a final concentration of 8 pM; 18 nucleotides of 
the guide strand were sequenced using a custom primer (miR30EcoRISeq, 
TAGCCCCTTGAATTCCGAGGCAGTAGGCA). To provide a sufficient base- 
line for detecting shRNA depletion in experimental samples, we aimed to acquire 
>500 reads per shRNA in the TO sample, which required more than ten million 
reads per sample to compensate for disparities in shRNA representation inherent 
in the pooled plasmid preparation or introduced by PCR biases. With these con- 
ditions, we acquired TO baselines of >500 reads for 1,072 (97% of all) shRNAs. 
Sequence processing was performed using a customized Galaxy platform’. For 
each shRNA and condition, the number of matching reads was normalized to the 
total number of library-specific reads per lane and imported into a database for 
further analysis (Access 2003, Microsoft). All shRNA sequences are provided in 
Supplementary Table 1. 

Animal studies. All mouse experiments were approved by the Cold Spring Harbor 
animal care and use committee. Leukaemia cells were transplanted by tail-vein 
injection of 1 x 10° cells into sub-lethally (5.5 Gy) irradiated B6/SJL(CD45.1) 
recipient mice. For whole-body bioluminescent imaging, mice were intraperito- 
neally injected with 50 mgkg~' D-Luciferin (Goldbio), and after 10 min, analysed 
using an IVIS Spectrum system (Caliper LifeSciences). Quantification was per- 
formed using Living Image software (Caliper LifeSciences) with standardized 
rectangular regions of interests covering the mouse trunk and extremities. For 
shRNA induction, animals were treated with doxycycline in both drinking water 
(2mgml-* with 2% sucrose; Sigma-Aldrich) and food (625mgkg ', Harlan 
laboratories). For JQ1 treatment trials, a stock of 100 mg ml! JQ1 in DMSO 
was 20-fold diluted by dropwise addition of a 10% 2-hydroxypropyl-f-cyclodex- 
trin carrier (Sigma) under vortexing, yielding a5 mg ml’ final solution. Mice were 
intraperitoneally injected daily with freshly diluted JQ1 (50 or 100mgkg ') ora 
similar volume of carrier containing 5% DMSO. 

Plasmids. For conditional RNAi experiments, shRNAs were expressed from 
either the TRMPV-Neo vector or the TtTMPV-Neo vector, which have been 
described previously (and are available as Addgene catalogue nos 27990 and 
27993)’. For screen validation, shRNAs were cloned into LMN(MSCV-miR30- 
PGK-NeoR-IRES-GFP), which was generated from LMP* by replacing the PuroR 
transgene with a NeoR cassette. For Myc rescue experiments, the wild-type mouse 
Myc cDNA was subcloned into MSCV-PGK-Puro-IRES-GFP (MSCV-PIG)*™*. 
Cell culture. All mouse MLL-leukaemia cell lines were derived from bone marrow 
obtained from terminally ill recipient mice, and were cultured in RPMI 1640 
(Gibco-Invitrogen) supplemented with 10% FBS, 100 U ml penicillin and 100 pg 
ml! streptomycin. MLL-AF9(alone), MLL-AF9/Nras@)?, Tet-on MLL-AF9/ 
Nras@!?P and MLL-ENL/FLT3""” cell cultures were derived as described prev- 
iously'*””. Tet-on immortalized MEF cultures were described previously’®. GIE 
cells were provided by M. Weiss. MEF cells were grown in DMEM with 10% FBS, 
100 Uml ' penicillin, 100 jig ml’ streptomycin and 1% glutamine (GIBCO). G1E 
cells were grown in IMDM with 15% FBS, 100 Um! penicillin, 100 pg ml! 
streptomycin, 2U ml’ erythropoietin (Sigma) and 10% Kit-ligand-conditioned 
medium. All human leukaemia cell lines were cultured in RPMI-1640 with 10% 
FBS, 100 U ml penicillin and 100 pg ml streptomycin, except KASUMI-1 cells, 
which were cultured in 20% FBS. NOMO-1, MOLM-13, EoL-1, NB4, HNT-34 and 
CMK were purchased from Deutsche Sammlung von Mikroorganismen und 
Zellkulturen GmbH (DSMZ). KASUMI-1, HL-60, MV4-11, KG1, HEL, THP-1, 
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B16-F10 and IMR-90 were obtained from ATCC. K-562 cells were provided 
by M. Carroll. U2OS, HeLa and Jurkat were provided by the CSHL tissue culture 
facility. CD34.MA9.NRAS and CD34.MA9.FLT3 cells were generated by retro- 
viral transduction of umbilical-cord-blood CD34* cells as described previ- 
ously*’®. All retroviral packaging was performed using ecotropic Phoenix 
cells according to established protocols (http://www.stanford.edu/group/ 
nolan/tutorials/retpkg_1_packlines.html). 

Western blotting. For Brd4 immunoblotting, 30 ug of whole-cell lysate RIPA 
extracts (25mM Tris (pH 7.6), 150 mM NaCl, 1% NP-40, 1% sodium deoxycho- 
late, 0.1% SDS) were loaded into each lane. For Myc immunoblotting, cells were 
lysed directly in Laemmli buffer and about 50,000 cell-equivalents were loaded 
into each lane. Protein extracts were resolved by SDS polyacrylamide gel electro- 
phoresis (SDS-PAGE) and transferred to nitrocellulose for blotting. 
Proliferation assay. Competitive proliferation assays using shRNAs in LMN or 
TRMPV-/TtRMPV-Neo vectors were performed as outlined in Supplementary 
Fig. 3a and as described previously", respectively. Proliferation assays for JQ] in 
vitro testing were performed by counting the increase in viable cell numbers over 
72h in the presence of different JQ1 concentrations. Dead cells were excluded 
using propidium iodide (PI) staining. Measurements of cell concentration were 
performed on a Guava Easycyte (Millipore), gating only viable cells (FSC/SSC/ 
PI). Proliferation rates were calculated using the equation In(cell concentration at 
72 h/cell concentration at 0 h)/72. Relative proliferation rates were calculated by 
normalizing to the rate of DMSO-treated cells. 

May-Griinwald-Giemsa cytospin staining. MLL-AF9/Nras@?” leukaemia cells 
were treated with lugml | doxycycline for 2 days to induce shRNA expression 
from TRMPV-Neo or TtTMPV vectors, or treated with 100 nM JQ1 for 2 days. 
50,000 cells were resuspended in 100 tl FACS buffer (5% FBS, 0.05% NaN; in PBS) 
and cytospun onto glass slides using a Shandon Cytospin 2 Centrifuge at 500 rpm 
for 5 min. May-Griinwald (Sigma) and Giemsa (Sigma) stainings were performed 
according to manufacturer’s protocols. Images were collected using a Zeiss 
Observer Microscope with a 40 objective. 

BrdU cell-cycle analysis and annexin V flow cytometry. BrdU incorporation 
assays were performed according to the manufacturer’s protocol (BD, APC BrdU 
flow kit), with cells pulsed with BrdU for 30 min. Cells were co-stained with 
7-aminoactinomycin D or 4',6-diamidino-2-phenylindole (DAPI) for DNA con- 
tent measurement. For all conditional shRNA experiments, the analysis was gated 
on Venus" /dsRed* (shRNA*) cell populations. Annexin V apoptosis staining 
was performed according to the manufacturer’s protocol (BD, APC annexin V). 
To analyse shRNA-mediated induction of apoptosis specifically, annexin V was 
quantified in viable shRNA-expressing cells (FSC/SSC; Venus*/dsRed*). 
Notably, this gating selectively evaluates early apoptotic cells (Annexin V~, 
DAPI ), excluding accumulated dead cells (Annexin V~, DAPI"). All analyses 
were performed using FlowJo software (Tree Star). 

shRNA experiments in human AML cell lines. THP-1 and MOLM-13 cells were 
modified to express the ecotropic receptor and rtTA3 using retroviral transduction 
of MSCV-RIEP (MSCV-rtTA3-IRES-EcoR-PGK-Puro) followed by drug selec- 
tion (0.5 and 1 ug ml! puromycin for 1 week, respectively). The resulting cell lines 
were transduced with ecotropically packaged TRMPV-Neo-shRNA retroviruses, 
selected with 400 pg ml” * G418 for 1 week and treated with 1 pg ml ' doxycycline 
to induce shRNA expression. The relative change in Venus */dsRed* (shRNA* +) 
cells was monitored on a Guava Easycyte (Millipore). BrdU cell-cycle analysis was 
performed as described above. 

Adult primary leukaemia sample analysis. The study was approved by the 
Institutional Review Board (ethics committee) of the Medical University of 
Vienna. Primary leukaemic cells were obtained from peripheral blood or bone 
matrow aspirate samples. Informed consent was obtained before blood donation 
or bone marrow puncture in each case. Diagnoses were established according to 
criteria provided by the French-American-British (FAB) cooperative study 
group’’* and the World Health Organization (WHO). Mononuclear cells were 
prepared using Ficoll and stored in liquid nitrogen until used. HL60 and MOLM13 
cell lines (obtained from DSMZ) were included as controls. After thawing, the 
viability of AML cells ranged from 70% to 99% as assessed by trypan blue exclu- 
sion. Primary cells (thawed mononuclear cells, 5-10 X 10* cells per well) and cell 
lines (1-5 X 10* cells per well) were cultured in 96-well microtitre plates (TPP) in 
RPMI-1640 medium (PAA laboratories) with 10% fetal calf serum (FCS, 
Pasching) in the absence or presence of JQ1 (10-5,000 nM) at 37°C (5% CO,) 
for 48 h. In selected experiments, primary AML cells were incubated with JQ] in the 
presence or absence of a cocktail of proliferation-inducing cytokines: recombinant 
human (rh) G-CSF, 100 ng ml | (Amgen), rhSCF, 100 ng ml! (Peprotech) and 
rhIL-3, 100 ngml! (Novartis). After 48 h, 0.5 uCi* H-thymidine was added (16 h). 
Cells were then harvested on filter membranes in a Filtermate 196 harvester 
(Packard Bioscience). Filters were air-dried and the bound radioactivity was 
measured in a B-counter (Top-Count NXT, Packard Bioscience). All experiments 
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were performed in triplicates. Proliferation was calculated as a percentage of 
control (cells kept in control medium), and the inhibitory effects of JQ1 were 
expressed as ICs9 values. In 7 out of 12 patients, drug-exposed cells were analysed 
for morphologic signs of differentiation by Wright-Giemsa staining on cytospin 
slides. The thymidine incorporation assay was chosen as a proliferation assay 
because of its superior sensitivity and ease of implementation for suspension cells, 
as compared to other proliferation assays, such as MTT. 

Paediatric primary leukaemia sample analysis. Diagnostic bone marrow samples 
were collected, under protocols approved by an institutional review board, from 
newly diagnosed children with acute leukaemia. Informed consent was obtained in 
accordance with the Helsinki protocol. At the time of collection, primary leukaemic 
cells were enriched by density centrifugation using Ficoll-Paque PLUS (GE 
Healthcare) and subsequently stored in liquid nitrogen. Vials of cryopreserved cells 
were thawed, resuspended in media, and live leukaemic cells were enriched by 
density centrifugation. Cells were maintained in supplemented media with 20% 
FBS. All leukaemia cell cultures were incubated at 37°C in 5% CO). Primary 
leukaemia samples were treated with dose-ranges of JQ1 and vehicle control for 
72h in 96-well plates. For the annexin binding assays, cells were harvested and 
stained with Annexin V-PE and 7-AAD (BD Pharmingen), read on a FACSCalibur 
and analysed with FlowJo software (Tree Star). For the WST-1 assays, WST-1 
reagent (Roche Diagnostics) was added to the culture medium (1:10 dilution) 
and absorbance was measured at 450 nm using a Bio-Rad model 680 microplate 
reader (Bio-Rad Laboratories). WST-1 assays were performed in triplicate. 
Primary leukaemia samples were treated with 250nM JQ1 and vehicle control 
for 48 h in 96-well plates. Cytospins were prepared at baseline, 24h and 48h and 
stained with Wright-Giemsa solution (Sigma-Aldrich). Images were acquired 
using a Nikon Eclipse E600 microscope system (Nikon). Although similar to other 
metabolic assays measuring cell proliferation (for example, MTT), WST-1 has 
superior sensitivity for the assessment of cytotoxicity in primary leukaemia samples. 
Histological analysis of bone marrow. Paraffin-embedded sections were stained 
with haematoxylin and eosin (H&E). Photographs were taken on a Nikon Eclipse 
80i microscope with a Nikon Digital Sight camera using NIS-Elements F2.30 
software at a resolution of 2560 X 1920. Using Adobe Photoshop CS2, images 
were re-sized and set at a resolution of 300 pixels inch” ', autocontrast was applied 
and unsharp mask was used to improve image clarity. 

FACS evaluation of normal haematopoiesis. Bone marrow cells were obtained 
by flushing mouse femurs and tibias, followed by erythrocyte lysis with ACK buffer 
(150 mM NH,Cl, 10 mM KHCO; and 0.1 mM EDTA). Samples were washed in 
FACS buffer (5% FBS, 0.05% NaN; in PBS), followed by staining (two million cells 
in 100 pl of FACS buffer) for 1h. Antibody dilutions used were: mouse haemato- 
poietic lineage eFluor 450 cocktail (1:100), PE-Cy7 anti-mouse Kit (1:50), APC 
anti-mouse Sca-1 (1:100), APC anti-mouse B220 (1:100), APC anti-mouse Cd11b 
(1:100), APC anti-mouse TER-119 (1:100) and APC anti-mouse Gr-1 (1:100). 
Stained samples were analysed on an LSRII flow cytometer. Data analysis was 
performed using FlowJo software (Treestar). 

Expression microarrays. Microarrays were performed through the CSHL micro- 
array shared resource. RNA was isolated from 10’ cells using RNeasy Mini Kit 
(Qiagen). RNA quality was assessed on an Agilent 2100 Bioanalyser, RNA 
6000 Pico Series II Chips (Agilent) and samples with a RIN score of 7.0 or greater 
passed. RNA was amplified by a modified Eberwine technique, amplified antisense 
RNA was then converted to cDNA using a WT Expression kit (Ambion). The 
cDNA was then fragmented and terminally labelled with biotin, using the 
Affymetrix GeneChip WT Terminal Labelling kit (Affymetrix). Samples were then 
prepared for hybridization, hybridized, washed and scanned according to the 
manufacturer’s instructions on Mouse Gene ST 1.0 GeneChips (Affymetrix). 
Affymetrix Expression Console QC metrics were used to pass the image data. 
Raw data was processed by Affymetrix and Limma packages in R-based 
Bioconductor. Heatmaps were generated using GenePattern software”; RMA- 
processed microarray data was converted into a log, scale, selected gene lists were 
row-normalized and visualized using the HeatMapImage module on GenePattern. 
All raw microarray data files are available from the Gene Expression Omnibus 
(GSE29799). 


GSEA analysis. Gene set enrichment analysis” was performed using GSEA v2.07 
software with 1,000 phenotype permutations. Leukaemia-stem-cell and Myc gene 
sets were obtained from the indicated publications’'””*. The macrophage- 
development gene set was obtained from the Ingenuity Pathway Analysis (IPA) 
software (Ingenuity). To perform GSEA on human microarray data, mouse gene 
sets were converted into human gene names using bioDBNet dbWalk (http:// 
biodbnet.abcc.ncifcrf.gov/db/dbWalk.php) or manually using the NCBI database. 
A detailed description of GSEA methodology and interpretation is provided at 
http://www.broadinstitute.org/gsea/doc/GSEA UserGuideFrame.html. In_ brief, 
the normalized enrichment score (NES) provides ‘the degree to which a gene set 
is overrepresented at the top or bottom of a ranked list of genes’. The false dis- 
covery rate q-value (FDR q-val) is ‘the estimated probability that a gene set with a 
given NES represents a false positive finding’. ‘In general, given the lack of coher- 
ence in most expression datasets and the relatively small number of gene sets being 
analyzed, an FDR cutoff of 25% is appropriate.’ Gene sets used in this study are 
included in Supplementary Table 2. 

Chromatin immunoprecipitation. ChIP assays were performed exactly as 
described*'. Crosslinking was performed with sequential EGS (Pierce) and 
formaldehyde’. All results were quantified by quantitative PCR performed using 
SYBR green (ABI) on an ABI 7900HT. Each immunoprecipitate signal was 
referenced to an input standard-curve dilution series (immunoprecipitate/input) 
to normalize for differences in starting cell number and for primer amplification 
efficiency. 

qRT-PCR. RNA was prepared using TRIzol reagent (Invitrogen). Synthesis of 
cDNA was performed using qScript cDNA SuperMix (Quanta Biosciences). 
Quantitative PCR analysis was performed on an ABI 7900HT with SYBR green 
(ABI). All signals were quantified using the ACt method. All signals were normalized 
to the levels of Gapdh. 

Antibodies. The anti-Brd4 antibody used for western blotting was a gift from G. 
Blobel. The anti-Brd4 antibody used for ChIP was purchased from Sigma 
(HPA015055). The anti-Myc antibody was purchased from Epitomics (1472-1). 
Antibodies used in FACS were: APC anti-mouse CD117/Kit (Biolegend, 105811), 
APC anti-mouse CD11b (Biolegend, 101211), Pacific Blue anti-mouse CD45.2 
(Biolegend, 109820), mouse haematopoietic lineage eFluor 450 cocktail 
(eBioscience, 88-7772-72), APC anti-mouse CD45R/B220 (Biolegend, 103212), 
APC anti-mouse TER-119/erythroid cells (Biolegend, 116212), APC anti-mouse 
Ly-6G/Gr-1 (eBioscience, 17-5931), PE-Cy7 anti-mouse CD117/Kit (eBioscience, 
25-1171-82) and APC anti-mouse Sca-1 (eBioscience, 17-5981-81). The anti-B- 
actin HRP antibody was purchased from Sigma (A3854). 
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Recurrent chromosomal translocations involving the mixed lineage 
leukaemia (MLL) gene initiate aggressive forms of leukaemia, which 
are often refractory to conventional therapies’. Many MLL-fusion 
partners are members of the super elongation complex (SEC), a 
critical regulator of transcriptional elongation, suggesting that 
aberrant control of this process has an important role in leukaemia 
induction”*. Here we use a global proteomic strategy to demon- 
strate that MLL fusions, as part of SEC’ and the polymerase- 
associated factor complex (PAFc)**, are associated with the BET 
family of acetyl-lysine recognizing, chromatin ‘adaptor’ proteins. 
These data provided the basis for therapeutic intervention in MLL- 
fusion leukaemia, via the displacement of the BET family of proteins 
from chromatin. We show that a novel small molecule inhibitor of 
the BET family, GSK1210151A (I-BET151), has profound efficacy 
against human and murine MLL-fusion leukaemic cell lines, 
through the induction of early cell cycle arrest and apoptosis. 
I-BET151 treatment in two human leukaemia cell lines with differ- 
ent MLL fusions alters the expression of a common set of genes 
whose function may account for these phenotypic changes. The 
mode of action of I-BET151 is, at least in part, due to the inhibition 
of transcription at key genes (BCL2, C-MYC and CDK6) through 
the displacement of BRD3/4, PAFc and SEC components from 
chromatin. In vivo studies indicate that I-BET151 has significant 
therapeutic value, providing survival benefit in two distinct mouse 
models of murine MLL-AF9 and human MLL-AF4 leukaemia. 
Finally, the efficacy of I-BET151 against human leukaemia stem 
cells is demonstrated, providing further evidence of its potent thera- 
peutic potential. These findings establish the displacement of BET 
proteins from chromatin as a promising epigenetic therapy for 
these aggressive leukaemias. 

Dysregulation of chromatin modifiers is a recurrent and sentinel 
event in oncogenesis®. Therapeutic strategies that selectively alter the 
recruitment and/or catalytic activity of these enzymes at chromatin 
therefore hold great promise as targeted therapies®. In this regard the 
bromodomain and extra terminal (BET) family of proteins (BRD2, 
BRD3, BRD4 and BRDT) provide an ideal “druggable’ target, because 
they share a common highly conserved tandem bromodomain at their 
amino terminus. Selective bromodomain inhibitors that disrupt the 
binding of BET proteins to histones have recently been described’*; 
however, their true therapeutic scope remains untested. 

To identify the nuclear complexes associated with ubiquitously 
expressed BETs (BRD2/3/4), we performed a systematic global pro- 
teomic survey. Specifically, this involved a tri-partite discovery 
approach (Fig. la). In the first approach, bead-immobilized analogues 


of I-BET762 (ref. 9) were incubated with HL60 nuclear extracts and 
bound proteins were analysed by quantitative mass spectrometry 
(Supplementary Table 1). This approach identified the BET isoforms 
and a large number of co-purifying proteins (Supplementary Tables 1 
and 2), indicating that the BET isoforms reside in many distinct protein 
complexes. In the second approach, immunoprecipitation analyses 
with selective antibodies against BRD2/3/4 were performed (Sup- 
plementary Fig. 1 and Supplementary Tables 3 and 4). This was com- 
plemented with additional immunoprecipitations using selected 
antibodies against complex members (‘baits’) selected from the subset 
of proteins that were identified in the first approach (Fig. 1b right 
panel, Supplementary Fig. 2 and Supplementary Table 3). In the third 
approach, bead-immobilized histone H4(1-21; K5acK8acK12ac) 
acetylated peptides were used to purify protein complexes. These data 
were combined to highlight a list of complexes identified in all three 
methods (Fig. 1b left panel, Supplementary Fig. 3 and Supplementary 
Table 1). Finally, specificity of the I-BET762 and histone tail matrix 
was further assessed by competition experiments (Fig. 1c, Supplemen- 
tary Figs 4, 5 and Supplementary Table 2). This strategy enabled 
the direct determination of the targets of the inhibitor, and the proteins 
associated with the target, with subunits of protein complexes 
exhibiting closely matching half-maximum inhibitory concentration 
(ICso) values’®. Taken together these stringent and complementary 
approaches provide a high confidence global data set encompassing 
all known''”? and several novel BET protein complexes (Fig. 1b and 
Supplementary Fig. 3). Among the novel complexes, we observed a 
prominent enrichment and dose-dependent inhibition of several com- 
ponents of the PAFc*? and SEC”® (Fig. 1b, c), which were confirmed by 
reciprocal immunoprecipitations in HL60 cells (Fig. 1b). Moreover, 
reciprocal immunoprecipitations in two MLL-fusion leukaemia cell 
lines (MV4;11 and RS4;11) confirmed the relationship of SEC with 
BRD4 in different cellular contexts (Fig. 1d). Together these data indi- 
cate that BRD3 and BRD4 associate with the PAFc and SEC and may 
function to recruit these complexes to chromatin. Given that these 
complexes are crucial for malignant transformation by MLL fusions” ° 
we tested the hypothesis that displacement of BET proteins from chro- 
matin may have a therapeutic role in these leukaemias. 

To progress our studies with an optimized therapeutic agent we 
developed I-BET151 (Fig. le); a novel dimethylisoxazole template, 
previously undisclosed as a BET bromodomain inhibitor. It was iden- 
tified and optimized to retain excellent BET target potency (Fig. 1i) 
and selectivity (Fig. 1h, Supplementary Figs 5-10 and Supplementary 
Table 5) while enhancing the in vivo pharmacokinetics and terminal 
half-life to enable prolonged in vivo studies (Fig. 4a and Supplementary 
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Figure 1 | A global proteomic survey identifies BET proteins as part of the 
PAFc and SEC. a, Proteomic strategy. b, Left, Cytoscape representation of the 
BET protein complex network (discussed in detail in Supplementary Fig. 3). 
Bold circles indicate associations confirmed by the three orthogonal methods. 
Right, heat map representing quantitative-mass spectrometry data following 
co-immunoprecipitation of BET's, PAF and SEC complex members. 

c, Differential proteomic analysis of the proteins interacting with I-BET and 
triple acetylated histone H4 tail. Left, affinity matrices with immobilized 
I-BET762 or histone H4(K5acK8acK12ac) peptide bind to the same set of BET 
complexes. Protein abundance was determined from signal intensities in the 
mass spectrometer (arbitrary units, K = 1,000). Right, competitive inhibition 
of the binding of BET isoforms, and SEC and PAF complex components, to the 
I-BET762 matrix showing matching concentration dependence. d, BRD4 and 
MLLT1 interact in HL60, MV4;11 and RS4;11 cells and binding to the 
I-BET762 matrix is blocked by excess I-BET151. e, Chemical structure of 
GSK1210151A (I-BET151). f, I BET151 binding to the acetyl-binding pocket of 
BRD4-BD1 (cyan) overlaid with H3K14-acetyl peptide (green) (Protein 


Fig. 20). We also generated proteomic selectivity profiles comparing 
I-BET151 with I-BET762 (Fig. 1h, Supplementary Fig. 5 and Sup- 
plementary Table 6). We bead-immobilized a combination of differ- 
entially acetylated histone tail peptides (Supplementary Table 7), 
which captured a total of 27 bromodomain proteins from HL60 nuc- 
lear extracts. Competition with excess I-BET151 or I-BET762 blocked 
the capture of BRD2, BRD3, BRD4, and BRD9 but had no effect on the 
23 other bromodomain proteins including MLL. The inhibition of 
BRD9 is likely to be indirect as this protein forms a complex with 
BRD4 (Supplementary Table 3). Finally, a high-resolution (1.5 A) 
crystal structure of I-BET151 bound to BRD4-bromodomain 1 
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Database ID 3jvk). A surface representation of the BRD4-BD1 is shown with 
key recognition and the specificity WPF shelf identified. g, Ribbon 
representation of the BRD4-BD1 (cyan) crystal structure complexed with 
I-BET151 (shown in magenta stick format) overlaid with H3(12-19)K14ac 
peptide (green) taken from its complex with BRD4-BD1(PDB ID 3jvk). 
Secondary elements of the BRD4-BD1 structure have been highlighted. 

h, Selectivity profile of IBET-151 showing average temperature shifts (T,,) 
using a fluorescent thermal shift assay. Numbering inside the spheres indicates 
bromodomains assessed; for example, 12 signifies both bromodomains 1 and 2 
have been assessed. Overlaid is the selectivity profile generated using a 
proteomic approach (shown as boxes around proteins, discussed in 
Supplementary Fig. 5). Where the bromodomains have been profiled by both 
thermal shift and proteomic approaches the agreement is excellent. Proteins 
not assessed by either technique are shown in grey. i, Comparison of I-BET762 
and I-BET151 potency in ligand displacement assays, direct Biacore binding 
and lipopolysaccharide-stimulated IL-6 cytokine production from human 
peripheral blood mononuclear cells (PBMC) or whole blood (WB). 


(BD1) revealed binding to the acetylated-lysine (AcK) recognition 
pocket of the BET protein (Fig. 1f, g and Supplementary Fig. 10). 

To assess the therapeutic efficacy and selectivity of I-BET151, we 
tested a panel of leukaemic cell lines harbouring a spectrum of distinct 
oncogenic drivers. These data demonstrated that I-BET151 has potent 
efficacy against cell lines harbouring different MLL-fusions (Fig. 2a 
and Supplementary Fig. 11). To extend these data we tested the 
clonogenic potential of human leukaemic cells grown in cytokine- 
supplemented methylcellulose containing dimethylsulphoxide 
(DMSO; vehicle) or I-BET151. Consistent with the profound effects 
in liquid culture, the colony-forming potential of MLL-fusion-driven 
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Figure 2 | I-BET151 selectively and potently inhibits MLL-fusion leukaemic 
cell lines in vitro. a, Human leukaemia cell lines tested using I-BET151. 

b, Clonogenic assays performed in the presence of DMSO or I-BET151. 

c, Haematopoietic progenitors were isolated from mouse bone marrow and 
retrovirally transformed with MLL-ENL or MLL-AF9. These cells were used in 
both proliferation and clonogenic assays d, Apoptosis was assessed by FACS 


leukaemias (MOLM13) was completely ablated by I-BET151, whereas 
leukaemias driven by tyrosine kinase activation (K562) were un- 
affected (Fig. 2b). In addition to the data with human leukaemic 
cell lines, we also confirmed the potent efficacy of I-BET151 in both 
liquid culture and clonogenic assays using primary murine progenitors 


analysis after 72 h incubation with DMSO or I-BET151. e, Cell cycle 
progression was assessed by FACS analysis 24 h after incubation with DMSO or 
I-BET151 (y axis event count, x axis arbitrary fluorescence units). Bar graphs 
are represented as the mean and error bars reflect standard deviation of results 
derived from triplicate experiments. 


retrovirally transformed with either MLL-ENL or MLL-AF9 (Fig. 2c). 

To investigate the mechanism of action for I-BET151, we performed 
fluorescence-activated cell sorting (FACS) analysis to assess apoptosis 
and cell cycle progression after I-BET151 treatment. Figure 2d—-e and 
Supplementary Fig. 12 show a marked induction of apoptosis and a 
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Figure 3 | Transcriptome and ChIP analyses provide mechanistic insights 
for the efficacy of I-BET151. a, Volcano plots for DMSO against I-BET151 
treated samples, showing the adjusted significance P value (logo) versus fold 
change (log,). b, Correlation of log, fold change between MV411 and 
MOLM131 across all genes. No genes show opposing expression changes. Lines 
represent the identity line (black solid), the line of best fit (black dotted), or log, 
fold-change threshold values (green dotted). c, Heat map of top 100 genes 


downregulated following treatment with I-BET151. d, BCL2 gene expression 
(normalized to B2M expression) is shown. Expression level of BCL2 in DMSO 
was assigned a value of 1. e, Immunoblotting demonstrating a decrease in BCL2 
and an increase in cleaved PARP (*) after I-BET151 treatment. f, ChIP analysis 
at the TSS and 3’ end of BCL2 is illustrated. Bar graphs are represented as the 
mean enrichment relative to input and error bars reflect standard deviation of 
results derived from biological triplicate experiments. 
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Figure 4 | I-BET151 is efficacious in in vivo murine models and primary 
patient samples of MLL-fusion leukaemia. a, Murine pharmacokinetic 
studies (mean + s.d., n = 4 per compound) comparing the blood concentration 
of I-BET151 with I-BET762 and JQ1. b, Kaplan-Meier curve of control and 
treated NOD-SCID mice transplanted with 1 x 10’ MV4;11 cells. Green 
arrowhead, treatment commencement on day 21. c, Haematoxylin and eosin- 
stained histological sections of the renal parenchyma of control and treated 
mice. Black arrows highlight leukaemic infiltration. d, Representative FACS 
analysis from the peripheral blood of control or I-BET151-treated mice. 

e, Kaplan-Meier curve of control and treated C57BL/6 mice transplanted with 
2.5 X 10° syngeneic MLL-AF9 leukaemic cells. Green arrowhead, treatment 
commencement on day 9. f, Photomicrograph of the spleen size from 5/8 
control and 1/12 I-BET151-treated mice that died on day 12. g, Haematoxylin 


prominent Go/G, arrest in two MLL-fusion cell lines driven by distinct 
MLL fusions (MOLM13 and MV4;11 containing MLL-AF9 and 
MLL-AF4, respectively). In contrast, the cell cycle characteristics 
and apoptotic rate of K562 cells were largely unaffected at this time. 
These data indicate that I-BET151 alters the transcriptional pro- 
grammes regulating apoptosis and cell-cycle progression in MLL- 
fusion leukaemias. 

To identify the precise transcriptional pathways controlled by 
I-BET151, global gene-expression analysis was performed in 
MOLM13 and MV4;11 cells after treatment with I-BET151 or 
DMSO for 6h. This strategy allowed us to identify early I-BET151- 
responsive genes, before any discernable phenotypic alteration in cell 
cycle or apoptosis (Supplementary Fig. 12). As demonstrated previ- 
ously’, we observed differential expression of a selective subset of genes 
(Fig. 3a), rather than global transcriptional dysregulation. Remarkably, 
the transcriptional programmes altered in the two MLL-fusion cell 
lines were highly correlated (Fig. 3b) and gene set enrichment analysis 
documented significant overlap with published MLL fusion signatures 
including MLL-fusion leukaemia stem cells (LSC)'*"* (Supplementary 
Fig. 13). These data are consistent with the notion that MLL fusions 
aberrantly co-opt the SEC and PAFc to regulate similar transcriptional 
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and eosin-stained histological sections of the liver parenchyma from control 
and I-BET151-treated mice demonstrating reduced disease burden in the 
treated animal. h-j, Peripheral blood white cell count (h), liver weight (i) and 
spleen weights (j) from all the control and treated mice at the time of necropsy. 
k, Representative FACS analysis assessing apoptosis from a patient with MLL- 
AF6 leukaemia. 1, Clonogenic assays with human MLL-fusion LSC isolated by 
FACS sorting (CD34*/CD38° ) and plated in the presence of DMSO or 
I-BET151. m, Gene expression changes in human MLL-fusion leukaemia cells 
following treatment with I-BET151 or DMSO. The log, fold change in the 
expression level for all genes (expression level with I-BET151 treatment/ 
expression level with DMSO) is represented. n, Schematic model proposing the 
mode of action for I-BET151 in MLL-fusion leukaemia. 


programmes. Notably, the top 100 genes concomitantly decreased in 
both MOLM13 and MV4;11 (Fig. 3c) contained several previously 
reported direct MLL targets, such as BCL2, CDK6 and MYC, the down- 
regulation of which was consistent with the phenotypic consequences 
of I-BET151 treatment. 

BCL2 is a key antiapoptotic gene implicated in the pathogenesis of 
MLL-fusion leukaemias!®!’. Consistent with these data, I-BET151 
reduced the expression of BCL2 in a third MLL-fusion cell line 
(NOMO1) but not in the unresponsive K562 cells (Fig. 3d), and induc- 
tion of apoptosis coincided with a marked reduction in BCL2 protein 
expression (Fig. 3e). Moreover, overexpression of BCL2 in the pres- 
ence of I-BET151 rescued the apoptotic phenotype (Supplementary 
Fig. 14). Chromatin immunoprecipitation (ChIP) analyses at the BCL2 
locus showed that 6 h of I-BET151 treatment selectively decreased the 
recruitment of BRD3/4 and impaired recruitment of CDK9 and PAF1 
(part of SEC and PAFc, respectively) to the transcriptional start site 
(TSS). This correlated with reduced phosphorylation of RNA poly- 
merase II (Pol II) on serine 2 of its carboxy-terminal domain (Pol- 
IIS2ph) (Fig. 3f). A similar pattern was observed at two other MLL 
target genes (MYC and CDK6), but not at housekeeping genes (B2M) 
whose expression was unaltered by I-BET151 (Supplementary Fig. 15). 
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Together, these data indicate that the mechanism of efficacy for 
I-BET151 involves a selective abrogation of BRD3/4 recruitment to 
chromatin. The consequence of this is the inefficient phosphorylation/ 
recruitment of Pol II. Further investigation is necessary to distinguish 
whether Pol II recruitment and/or elongation is primarily affected by 
I-BET151. 

We next sought to establish the therapeutic potential of -BET151 in 
vivo. We first characterized the pharmacokinetic properties of 
I-BET151 in several preclinical species (Supplementary Fig. 20) and 
also compared it to published inhibitors”* (Fig. 4a). We then assessed 
the efficacy of I-BET151 in two established models of MLL leukaemia. 
Our first model was a xenotransplant model of disseminated human 
MLL-AF4 leukaemia'®. I-BET151 was delivered daily at 30 mgkg ' by 
intraperitoneal injection from day 21 (ref. 18), and mice were humanely 
killed if clinical disease dictated or if there was a sequential rise in 
peripheral blood disease. At the experimental end-point all the control 
mice had succumbed to fulminant or progressive disease whereas only 
one out of five mice in the treated cohort had evidence of disease at low 
levels (Fig. 4b-d and Supplementary Fig. 16). In our second syngeneic 
model of murine MLL-AF9 leukaemia, 2.5 X 10° leukaemic cells, estab- 
lished from serial transplantation, were injected into tertiary recipients. 
Despite the latency being reduced to less than 15 days, we waited to 
initiate treatment from day 9 to test the efficacy of I-BET151 in the 
setting of overwhelming established disease (Fig. 4e), the scenario often 
encountered in clinical practice. Even here I-BET151 provided a clear 
and marked survival benefit (Fig. 4e-j and Supplementary Fig. 17). 
Taken together, these data demonstrate that I-BET151 provides excel- 
lent control of MLL leukaemia progression in two distinct and comple- 
mentary murine models. 

Finally, to demonstrate the applicability of our findings to human 
disease, we tested the efficacy of I-BET151 in leukaemia cells isolated 
from patients with various MLL fusions. These data show that 
I-BET151 accelerates apoptosis (Fig. 4k and Supplementary Fig. 18), 
and abrogates clonogenic efficiency in bulk leukaemia (Supplementary 
Fig. 19) as well as isolated LSC (Fig. 41). These effects are driven, at least 
in part, by downregulation ofa similar transcription programme iden- 
tified in MLL-fusion cell lines (Fig. 4m). Taken together, these data 
provide compelling evidence of therapeutic potential and suggest that 
disease eradication is possible. 

The paradigm for epigenetic drug discovery shown here highlights an 
emerging role for targeting aberrant transcriptional elongation in onco- 
genesis”° and provides the first example in epigenetic therapy where 
mechanistic insights have driven targeted drug discovery and application 
(Fig. 4n). Together, our results suggest that perturbing the interaction of 
BET proteins with chromatin using I-BET151 may be of great thera- 
peutic value in human MLL-fusion leukaemias. Using a complementary 
strategy and a different BET inhibitor, a separate study published in this 
issue concurs with this view'’. Moreover, the extensive proteomic 
resource provided here has identified other important disease-associated 
proteins binding to BET proteins, such as MMSET (WHSC1), which is 
implicated in multiple myeloma”®. This raises the possibility that BET 
inhibitors may have an even wider therapeutic scope in oncology and 
perhaps in other areas of unmet need within the clinical arena. 


METHODS SUMMARY 

Cell culture, gene expression, chromatin immunoprecipitation and FACS analysis 
were performed as previously described”'. Proteomic profiling and characterization 
of inhibitor specificity was performed using methodology previously described”””. 
Detailed information about the reagents and methodology used in this study is 
available in Supplementary Information. 
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Active-site remodelling in the bifunctional 
fructose-1,6-bisphosphate aldolase/phosphatase 


Juan Du!*, Rafael F. Say**, Wei Lit, Georg Fuchs? & Oliver Einsle>? 


Fructose-1,6-bisphosphate (FBP) aldolase/phosphatase is a bifunc- 
tional, thermostable enzyme that catalyses two subsequent steps 
in gluconeogenesis in most archaea and in deeply branching 
bacterial lineages’*. It mediates the aldol condensation of heat- 
labile dihydroxyacetone phosphate (DHAP) and glyceraldehyde-3- 
phosphate (GAP) to FBP’, as well as the subsequent, irreversible 
hydrolysis of the product to yield the stable fructose-6-phosphate 
(F6P) and inorganic phosphate; no reaction intermediates are 
released. Here we present a series of structural snapshots of the 
reaction that reveal a substantial remodelling of the active site 
through the movement of loop regions that create different 
catalytic functionalities at the same location. We have solved the 
three-dimensional structures of FBP aldolase/phosphatase from 
thermophilic Thermoproteus neutrophilus”* in a ligand-free state 
as well as in complex with the substrates DHAP and FBP and the 
product F6P to resolutions up to 1.3. In conjunction with 
mutagenesis data, this pinpoints the residues required for the 
two reaction steps and shows that the sequential binding of 
additional Mg”* cations reversibly facilitates the reaction. FBP 
aldolase/phosphatase is an ancestral gluconeogenic enzyme opti- 
mized for high ambient temperatures’”, and our work resolves how 
consecutive structural rearrangements reorganize the catalytic 
centre of the protein to carry out two canonical reactions in a very 
non-canonical type of bifunctionality. 

Aldolases constitute a distinct group of lyase enzymes that catalyse 
the stereospecific addition of a nucleophilic donor substrate to an elec- 
trophilic acceptor’ ’. Class I aldolases are commonly homotetramers'® 
and are found in some bacteria, archaea and higher eukaryotes''”. They 
activate their donor substrate by forming a Schiff base with the e-amino 
group of a conserved lysine residue’*. Aldolases of class II are found in 
bacteria and fungi and use divalent metal cations (mostly Zn**, but 
frequently Fe’* or Co~*) to activate the donor nucleophile’, Although 
the two classes of aldolases do not show homologies in primary struc- 
ture and are thus phylogenetically distinct, they are structurally related 
and belong to the family of (B/a)g TIM barrel enzymes'*"*. Further 
members of this family include most other aldolases, such as the homo- 
decameric class IA FBP aldolase*"’, transaldolase, deoxyribose phos- 
phate aldolase, 2-keto-3-deoxy-(6-phospho)-gluconate (KD(P)G) 
aldolase and 3-deoxy-D-arabino-heptulosonat-7-phosophate (DAHP) 
synthase”. 

Fructose-1,6-bisphosphate aldolase (E.C. 4.1.2.13) catalyses the 
reversible aldol cleavage of FBP into DHAP and GAP in the Embden- 
Meyerhof-Parnas pathway” either in glycolysis or gluconeogenesis and 
in the Calvin-Benson cycle’. Most recently an enzyme previously 
described as an archaeal FBP phosphatase in Thermococcus kodakarensis 
and Sulfolobus tokodaii*” was shown to be indeed a bifunctional FBP 
aldolase/phosphatase (FBPAP)*. Although this enzyme is physiolo- 
gically unrelated to any known aldolase, it catalyses the reaction 
following the class I mechanism” involving a lysine Schiff base**”. 
The aldolase reaction is a classic case for a fully reversible enzymatic 


reaction’, but in FBPAP - in spite of a similar mechanism — the K,, 
values for the aldol condensation and the aldol cleavage differ by a 
factor of 1,000, and the intrinsic phosphatase activity renders the pro- 
cess irreversible*. Orthologues of FBPAP are found in most genomes of 
archaea and early bacterial lineages. 

A structure of the enzyme from S. tokodaii revealed a novel tertiary 
structure with a ferredoxin-like fold in the amino (N)-terminal part and 
similarities to bacterial S-adenosyl methionine decarboxylases in the 
carboxy (C)-terminal part. The protein formed globular homooctamers 
and had four Mg’* ions and FBP bound to the active site”. To 
understand the twofold reactivity of FBPAP we have crystallized and 
characterized the enzyme from the hyperthermophilic crenarchaeon T. 
neutrophilus (TnFBPAP) ina ligand-free state, with the substrates of the 
two reaction steps, DHAP and FBP, and with the product F6P. 
TnFBPAP is highly similar to the S. tokodaii orthologue, differing pre- 
dominantly in a C-terminal extension that embraces another protomer 
of the conserved octamer (Supplementary Fig. 1)””. However, three loop 
regions surrounding the active site of the enzyme substantially change 
their conformation when substrate, reaction intermediate or product is 
bound. The re-orientation of these loops during the catalytic cycle alters 
the structure and functionality of the active site fundamentally, while 
keeping the substrate locked in place. Unlike bi- or multifunctional 
enzymes described previously, FBPAP does not recruit distinct 
domains for its reactions or connect different active sites by substrate 
channelling. Instead the enzyme remodels its single active centre in 
order to bring the amino-acid side chains and cofactors—divalent Mg 
cations—into place. 

In the following, the loop region from residues 220 to 235 will be 
termed the ‘aldolase loop’, as residues K232 and Y229 are essential for 
aldolase activity. It is observed in three different conformations that we 
designate ‘in’, ‘out’ and ‘locked’ (Fig. 1). The largest conformational 
changes were observed in the loop encompassing residues 89-111, the 
‘phosphatase lid’ that serves to fix the intermediate FBP in the binding 
pocket for the phosphatase reaction. It is seen in two conformations, 
‘open’ and ‘closed’. The third flexible loop is the ‘anchor loop’ from 
residues 353 to 364 that fixes FBP after the aldol condensation and 
attains an ‘in’ or an ‘out’ conformation. The consecutive binding of 
additional Mg*” ions to the protein is then key to switching its catalytic 
functionality. 

In the substrate-free structure of TnFBPAP, two Mg’* ions are 
bound to the protein, with Mg1 coordinated by residues D11, H18 
and D52, and Mg2 by residues D52, D53, D132 and D234 from the 
aldolase loop. Both Mg** ions show an octahedral coordination 
environment, completed by three water ligands for Mgl and by two 
water ligands for Mg2 (Fig. 2a and Supplementary Figs 2 and 3a). In 
the structure, a substrate binding cleft is visible close to the ions, and 
four of the five water ligands are at the surface of the protein. The 
phosphatase lid is in its ‘open’ conformation, the aldolase loop is ‘out’ 
and the anchor loop is ‘in’ (Figs 1a and 2a and Supplementary Fig. 3a). 
A structural analysis of EDTA-treated TnFBPAP shows that Mg] can 
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be removed, while Mg2 remains bound to the protein. Ina calorimetric 
titration Mg1 binds with a Kg of 27 1M (Supplementary Fig. 5), but no 
binding of further magnesium occurs. In this state the enzyme can 
bind its first substrate, DHAP (but not GAP), with a Kg of 25 41M 
(Supplementary Fig. 7), thereby creating a further Mg” binding site. 

Consequently, the crystal structure with bound DHAP contains 
three Mg“, and each of the C1-phosphoryl oxygen atoms coordinates 


Figure 2 | The successive reaction steps of FBP aldolase/phosphatase in the 
crystal structures. a, The unliganded enzyme; b, the complex with 
dihydroxyacetone phosphate (DHAP); c, the complex with fructose-1,6- 
bisphosphate (FBP); d, the product complex with fructose-6-phosphate (F6P). 


LETTER 


Figure 1 | Representation of the reaction steps of FBP aldolase/ 
phosphatase. The phosphatase loop is shown in blue, the aldolase loop in red 
and the anchor loop in yellow. a, In the unliganded state the enzyme binds two 
Mg’* ions and the active site is accessible. b, Binding of the substrate 
dihydroxyacetone phosphate is followed by a third Mg’* ion that redirects the 
aldolase loop to form a Schiff base with the substrate. The anchor loop is 
retracted to accommodate this conformational change. c, The co-substrate 
glyceraldehyde-3-phosphate binds and the enzyme catalyses the aldol 
condensation. Release of the Schiff base leads to a rearrangement of the aldolase 
loop that creates a binding site for a fourth Mg** ion. The phosphatase loop 
changes its conformation drastically closing the active site. d, Upon hydrolysis 
of the phosphate group at C1, the tetranuclear Mg site disassembles and the 
three loops attain their original conformations. 


one of the cations (Figs 1b and 2b and Supplementary Fig. 3b). At both 
Mgl and Mg2, two phosphoryl oxygens replace the coordinated water 
molecules (Supplementary Figs 2 and 3b). In the next step in the 
reaction cascade, the backbone carbonyl oxygen of residue K232 occu- 
pies a remaining, free coordination site at the metal ion. The following 
major structural rearrangement switches the aldolase loop to the ‘in’ 
position and the anchor loop to the ‘out’ position. The phosphatase lid 
remains open, but one of its residues, Q95, rearranges to become a 
ligand to Mg], likely stabilizing the complex metal site. More impor- 
tantly, in the ‘in’ conformation of the aldolase loop K232 is in close 
proximity to the substrate and is able to form a protonated Schiff base 
intermediate with DHAP that is visible in the crystal structure (Figs 1b 
and 2b and Supplementary Fig. 3b). 

The class-I aldolase mechanism requires a base that abstracts a 
proton in the next step*®*’, and in TnFBPAP residue Y229 is posi- 
tioned ideally to fulfil this role. It is part of the aldolase loop and swings 


Schiff 
base 


D297 


The reaction is guided by the successive binding of a third and fourth Mg** ion 
that orchestrate conformational rearrangements to position the aldolase (red), 
phosphatase (blue) and anchor loops (yellow) for their individual tasks. 
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into the active site when K232 forms the Schiff base. The distinct and 
isolated role of K232 and Y229 is emphasized by the fact that variants 
in these positions show only slightly reduced (Y229F) or even 
enhanced (K232R) phosphatase activity, whereas aldolase activity is 
completely abolished (Fig. 3 and Supplementary Table 2). The struc- 
ture of the Y229F variant of FBPAP shows binding of DHAP with two 
Mg?* ions, but the formation of the Schiff base is not observed. After 
its deprotonation, Y229 abstracts the pro-S proton at C3 of DHAP. 
This is ascertained by residue D297 that forms a short (2.6 A) hydro- 
gen bond with the hydroxyl group at C3 so that the pro-S H atom is 
fixed to face Y229 (Supplementary Fig. 3b). Proton abstraction results 
in a tautomeric rearrangement to yield an enamine intermediate. For 
the progression of the aldol condensation, the second substrate, GAP, 
then needs to bind to the active site. In the structure, the binding site 
for GAP is occupied by a second molecule of DHAP, giving a clear 
indication for the binding mode of the substrate, while being unreac- 
tive for the aldol condensation (Supplementary Fig. 3b). 

C-C bond formation between the DHAP enamine and GAP yields 
the product Schiff base (Fig. 4) that is released to trigger the next 
conformational rearrangement in preparation for the phosphatase 
step. As the aldolase loop flips outward, the anchor loop changes back 
to the ‘in’ conformation, and one of its key residues, Y358, forms 
hydrogen bonds to the C4 OH group and the C6 phosphate of FBP 
(Figs 1c and 2c and Supplementary Fig. 3c). The phosphatase lid 
switches to ‘closed’, preventing access to the active site cavity. 
Concomitantly, the aldolase loop attains its ‘locked’ conformation, 
wherein K232 forms a hydrogen bond to the backbone carbonyl group 
of P111 at the base of the phosphatase loop (Supplementary Fig. 4). In 
effect, the position of the neighbouring D233 inverts, and its B-carboxy 
group becomes key to the creation of an additional binding site for 
Mgé4. Two water ligands at Mg4 are hydrogen-bonded to the side chain 
of E357 in the anchor loop and the four Mg” cations of this functional 
state are tightly grouped around the C1 phosphate of FBP (Sup- 
plementary Fig. 3b). The crystal structure, at a resolution of 1.3 A, 
indeed shows a clear geometric distortion around the phosphorus 
atom, resulting in an O-P-O angle of approximately 90° for the two 
oxygen atoms coordinating Mgé4 (Fig. 2c). Note that before the binding 
of Mgé4 the three other Mg”* ions are bound on the same side of the 
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Figure 3 | Catalytic activity of distinct variants of FBP aldolase/ 
phosphatase. The two activities of the enzyme are tightly linked to distinct 
amino-acid residues and can be inactivated separately. Mutagenesis of K232 or 
Y229 on the aldolase loop abolishes aldolase activity while phosphatase activity 
is only slightly reduced (Y229F) or is even enhanced (K232R). In contrast, E357 
and Y358 on the anchor loop are essential for phosphatase activity, but 
alterations here (E357Q, Y358F) do not affect aldolase activity. A D233N 
mutant protein was impaired in both aldolase and phosphatase activity. WT, 
wild type. 
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Figure 4 | Proposed reaction mechanism of FBP aldolase/phosphatase. The 
steps of the aldol condensation reaction are boxed in green, those of the 
phosphatase reaction in blue. Crystal structures are available for the 
intermediates in solid boxed, whereby the lettering follows Figs 1 and 2. The 
aldolase loop is indicated by a red line, the anchor loop by a yellow line. 


phosphate group (Fig. 2b), thereby counteracting the formation of the 
trigonal bipyramidal transition state of phosphate ester hydrolysis. 
Variants of the anchor loop residues E357Q and Y358F exhibit 
unaltered aldolase activity, whereas phosphatase activity is fully 
abolished (Fig. 3). As described for the S. tokodaii enzyme, the struc- 
ture allows the identification of a water molecule coordinated by Mg2 
and Mg3 that conducts an in-line Sy2 nucleophilic attack onto the 
phosphorus atom (Supplementary Fig. 2). This leads to an inversion of 
the ligand environment and the cleavage of the phosphoester bond to 
the sugar substrate. 

At this point, the particular mechanistic intricacy of TnFBPAP 
becomes apparent. After binding of DHAP, the aldolase and phosphatase 
reaction steps were supported by the subsequent addition of tightly 
coordinated Mg** ions. The metal triad or tetrad was organized and 
stabilized by the phosphate group at its centre, but as this phosphate 
inverts its geometry upon hydrolysis it no longer fits the arrangement of 
the surrounding cations. The significant free enthalpy of the hydrolysis 
reaction provides the driving force to overcome the stability of the com- 
plex and disassemble the centre. These changes lead back to the structure 
of the complex with the product F6P in the open-chain form. This 
structure contains only two Mg** ions and shows all three loops in 
the conformations observed in the ligand-free form of the enzyme 
(Figs 1d and 2d and Supplementary Fig. 3d). Inorganic phosphate itself 
was not found to bind in crystal soaking experiments. 
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TnFBPAP catalyses a multi-step reaction by remodelling its active 
site according to the respective catalytic requirements (Fig. 4). At first 
glance, this seems more elaborate than to combine different enzymatic 
modules into a multi-enzyme complex as seen, for example, in 
pyruvate dehydrogenase*”. Yet TnFBPAP likely represents an 
ancestral gluconeogenic enzyme and the functional optimizations 
observed are an adaptation to high-temperature environments, where 
the instability of the substrates DHAP and GAP presents a serious 
problem’. Large conformational changes and the coupling to the 
exergonic phosphate hydrolysis allow the enzyme to render the 
aldolase reaction irreversible to assure that the cellular levels of the 
sensitive triose phosphates can be kept low. Besides its evolutionary 
impact, this enzyme sets an elucidatory example for how consecutive 
dramatic conformational changes can reorganize an active centre to 
perform two drastically different catalytic steps in a highly controlled 
and ordered sequence. 


METHODS SUMMARY 


C-terminally His,-tagged TnFBPAP (Tneu_0133) was heterologously produced 
in Escherichia coli as described previously~. Crystals were grown at 20°C by the 
hanging-drop vapour-diffusion method. Two microlitres of protein solution were 
mixed with 2 il ofa reservoir solution containing 8% (w/v) of polyethylene glycol 
3350 and 0.1 M HEPES/NaOH at pH 7.0-8.0. The drops were equilibrated against 
reservoir solution without added protein. The crystals were briefly soaked in 15% 
(v/v) of 2R,3R-butane diol for cryoprotection before being plunged into liquid 
nitrogen. For the preparation of complexes with the compounds dihydroxyacetone 
phosphate, fructose-1,6-bisphosphate and fructose-6-phosphate, the native crys- 
tals were soaked in 15% (v/v) of 2R,3R-butane diol and 100 mM of the respective 
substrate for 20 min before being treated for cryo-protection as described above. 
Diffraction data were collected at beamline X06SA at the Swiss Light Source 
(Villigen, Switzerland) at an X-ray wavelength of 1.0 A. The native structure was 
refined to Reryst = 0.165 and Réree = 0.188 at a resolution of 1.52 A. The crystal 
structures in complex with DHAP, FBP and F6P were solved by molecular replace- 
ment with the native structure as the search model (Supplementary Table 1). 
Activity assays for the Cenarchaeum symbiosum and T. neutrophilus enzymes were 
performed at 40°C and 48 °C, respectively, using a coupled spectrophotometric 
assay as described previously’. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cloning, expression and purification. C-terminally His,-tagged TnFBPAP 
(Tneu_0133) was heterologously produced in E. coli as described previously’. 
Frozen E. coli cells (30 g wet mass) were suspended in 30 ml re-suspension buffer 
(20mM Tris/HCl pH 7.8, 20mM MgCl, 0.5mM phenylmethanesulphonyl 
fluoride) containing 0.1mgml’ DNase I and passed twice through a French 
pressure cell at 137 MPa. The cell lysate was centrifuged at 150,000g for 1h 
(4°C). The cell extract (10 ml) was loaded onto a 1 ml Ni’* -chelating Sepharose 
affinity column (GE Healthcare) equilibrated with running buffer (20 mM Tris/ 
HCl pH 7.8, 50 mM MgCl and 200 mM NaCl) at a flow rate of 1 ml min |. The 
column was washed with running buffer containing 70mM imidazole and 
developed with a linear gradient of imidazole. Recombinant TnFBPAP eluted at 
a concentration of 275 mM imidazole. Active fractions were pooled and the buffer 
was exchanged for 50 mM 3-(N-morpholino)propanesulphonate (MOPS) buffer 
at pH 8.0 with 10 mM MgCl using a PD-10 column (GE Healthcare). For sub- 
sequent heat precipitation, the enzyme was incubated for 30 min at 80 °C, cooled 
on ice for 15 min and centrifuged at 17,000g for 30 min (4 °C). For crystallization, 
the supernatant with purified protein was concentrated by ultrafiltration to a final 
concentration of 7mg ml". 

Crystallization. Crystals were grown at 20°C by the hanging-drop vapour- 
diffusion method. Two microlitres of protein solution were mixed with 2 ul of a 
reservoir solution containing 8% (w/v) of polyethylene glycol 3350 and 0.1M 
HEPES/NaOH at pH 7.0-8.0. The drops were equilibrated against reservoir solu- 
tion without added protein. Native crystals appeared after 1h and reached their 
maximum size after approximately 3 days. The crystals were briefly soaked in 15% 
(v/v) of 2R,3R-butane diol for cryoprotection before being plunged into liquid 
nitrogen. For the preparation of complexes with the compounds dihydroxyacetone 
phosphate, fructose-1,6-bisphosphate and fructose-6-phosphate, the native crys- 
tals were soaked in 15% (v/v) of 2R,3R-butane diol and 100 mM of the respective 
substrate for 20 min before being treated for cryo-protection as described above. 
Data collection and structure determination. Diffraction data were collected at 
beamline X06SA at the Swiss Light Source (Villigen, Switzerland) at an X-ray 
wavelength of 1.0 A. The crystals of TnFBPase belonged to the tetragonal space 
group 1422 with one monomer per asymmetric unit. Data were indexed and 
integrated using MOSFLM” and scaled with sCALA”'. Structure solution was 
carried out by molecular replacement using the program MOLREP” and the 
structure of the homologous enzyme from S. tokodaii (PDB-ID 1UMG) as the 
initial search model. Refinement was performed with REFMACS (ref. 33) and 
model building was performed using coot**’. The native structure was refined to 
Reryst = 0.165 and Réree = 0.188 (ref. 35) at a resolution of 1.52 A. The crystal 


structures in complex with DHAP, FBP and F6P were solved by molecular replace- 
ment with the native structure as the search model (Supplementary Table 1). 
Site-directed mutagenesis. Mutagenesis was performed on the synthetic 
C. symbiosum gene’. The first mutations were introduced into the expression 
vector pT7-7 (ref. 36) carrying the gene (FBP_C.symb-X-pT7-7 (ref. 2)) by reverse 
PCR using a single mutagenic oligonucleotide’. PCR conditions were as follows: 25 
cycles of 20 s denaturation at 98 °C, 20s primer annealing, and elongation at 72 °C 
using Phusion DNA polymerase (New England Biolabs). The PCR products were 
incubated at 37 °C with 20 U of Dpnl for 3h to digest the methylated template 
plasmid. After amplification of the plasmid in E. coli DH5a, the mutation was 
confirmed by sequencing. Competent E. coli BL21 (DE3) Rosetta2 cells (Novagen) 
were transformed with the corresponding plasmid and grown at 37 °C in 1-1 flasks 
with self-inducing medium” containing 100 pgml"' ampicillin and 34g ml 
chloramphenicol. After 5-6h at 37°C, the temperature was lowered to 20°C 
and the culture was grown for 14h at 20 °C. Cells were collected by centrifugation 
and stored in liquid nitrogen until further processing. 

Production of FBP aldolase/phosphatase mutants from C. symbiosum. 
Preparation of cell extract from 3-6 g wet mass of frozen E. coli cells and affinity 
chromatography were performed following the procedure described above for the 
T. neutrophilus protein. 

FBP aldolase/phosphatase enzyme assays. Activity assays for the C. symbiosum 
and T. neutrophilus enzymes were performed at 40°C and 48 °C, respectively, 
using a coupled spectrophotometric assay as described previously’. 
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Structural basis for the bifunctionality of 
fructose-1,6-bisphosphate aldolase/phosphatase 


Shinya Fushinobu'*, Hiroshi Nishimasu*, Daiki Hattori’, Hyun-Jin Song’ & Takayoshi Wakagi! 


Enzymes catalyse specific reactions and are essential for maintain- 
ing life. Although some are referred to as being bifunctional, they 
consist of either two distinct catalytic domains or a single domain 
that displays promiscuous substrate specificity’. Thus, one enzyme 
active site is generally responsible for one biochemical reaction. In 
contrast to this conventional concept, archaeal fructose-1,6- 
bisphosphate (FBP) aldolase/phosphatase (FBPA/P) consists of 
a single catalytic domain, but catalyses two chemically distinct 
reactions of gluconeogenesis: (1) the reversible aldol condensation 
of dihydroxyacetone phosphate (DHAP) and glyceraldehyde- 
3-phosphate (GA3P) to FBP; (2) the dephosphorylation of FBP 
to fructose-6-phosphate (F6P)”. Thus, FBPA/P is fundamentally 
different from ordinary enzymes whose active sites are responsible 
for a specific reaction. However, the molecular mechanism by 
which FBPA/P achieves its unusual bifunctionality remains 
unknown. Here we report the crystal structure of FBPA/P at 
1.5-A resolution in the aldolase form, where a critical lysine residue 
forms a Schiff base with DHAP. A structural comparison of the 
aldolase form with a previously determined phosphatase form’ 
revealed a dramatic conformational change in the active site, 
demonstrating that FBPA/P metamorphoses its active-site 
architecture to exhibit dual activities. Thus, our findings expand 
the conventional concept that one enzyme catalyses one bio- 
chemical reaction. 

FBPA/P was initially identified as class V fructose-1,6-bisphospha- 
tase (FBPase)*, and is responsible for gluconeogenesis in the hyperther- 
mophilic archaeon Thermococcus kodakaraensis*’. Class V FBPases 
lack sequence homology with other FBPase or FBP aldolase (FBPA) 
proteins, and are found in the genomes of virtually all Archaea and 
deeply branching Bacteria (Supplementary Fig. 1). FBPA/P shows 
higher activity in the condensation reaction than in the FBP cleavage 
reaction® (Supplementary Fig. 2), indicating its involvement in gluco- 
neogenesis, rather than in glycolysis. Thus, FBPA/P has been proposed 
to represent an ancestral gluconeogenic enzyme that ensures uni- 
directional gluconeogenesis in chemolithoautotrophic organisms’. 
We previously determined the crystal structure of an FBPA/P protein 
from Sulfolobus tokodaii (ST0318) at 1.8-A resolution, in complex with 
FBP and catalytically essential Mg”* ions, which represents the ‘FBPase 
form”. The structure revealed a unique fold and lacked similarity to 
those of other known FBPase and FBPA proteins. Unexpectedly, bio- 
chemical experiments indicated that a conserved lysine residue (Lys 232 
of ST0318), which is located away from the bound FBP molecule in the 
ST0318 structure, forms a Schiff base with DHAP during the FBPA 
reaction’, suggesting that FBPA/P undergoes a large structural change 
for its dual activities. However, the molecular mechanism by which 
FBPA/P exhibits its bifunctionality remains unknown. Here, we present 
the crystal structure of ST0318 in the DHAP-Schiff base intermediate 
state, which represents the “FBPA form’. A comparison of the high- 
resolution structures of FBPA/P in the aldolase and phosphatase forms 
provides the structural basis for its bifunctionality. 


We measured the FBPA and FBPase activities of the purified recom- 
binant ST0318 enzyme, and confirmed that the enzyme exhibits both 
activities (Table 1), demonstrating that ST0318 is also an FBPA/P 
enzyme. We crystallized ST0318 in the presence of DHAP and 
Mg**, and determined the crystal structure at 1.5-A resolution 
(Fig. la, Supplementary Fig. 3a and Supplementary Table 1). The 
crystal belongs to the space group 1422, and is isomorphous to the 
previously determined FBPase form of ST0318 (PDB code 1UMG). 
The monomer structures are almost identical in the FBPA and FBPase 
forms, with root mean square deviations of 3.28 and 0.29 A for the Co 
atoms of the overall polypeptide and the regions without the three 
mobile loops (residues 97-110, 219-233 and 346-361). As in the 
FBPase form’, the protomers in the asymmetric units are related by 
crystallographic symmetry to form an octamer (Supplementary Fig. 4). 
Thus, the ternary and quaternary structures are virtually the same in 
these two forms. We observed an electron density for the bound 
DHAP in the active site, located at the dimer interface of the ring- 
shaped tetramer (Supplementary Fig. 5). As previously postulated’, 
Lys 232 formed a Schiff base intermediate with DHAP (Fig. la and 
Supplementary Fig. 3a). The Lys 232 NC atom and the DHAP Cl, C2 
and C3 atoms adopt a nearly planar arrangement, indicating that the 
intermediate is similar to the imine (iminium) form observed in a 
typical class I FBPA, for example rabbit muscle FBPA (rmFBPA)°*. 
The intermediate may have been trapped, owing to the absence of 
the second substrate, GA3P. The DHAP phosphate group coordinates 
three Mg”* ions (Mg2-M¢g4) (Supplementary Fig. 6), whereas the FBP 
1-phosphate group coordinates four Mg** ions (Mgl—Mg4) in the 
FBPase form’. The DHAP hydroxyl group is recognized by Arg 266 
and Asp 287 through direct hydrogen bonds, and by Gln 242’ (the 
prime symbol indicates residues from the neighbouring protomer) 
through a water-mediated hydrogen bond. These hydrogen-bonding 
interactions are likely to be critical for the discrimination between 
DHAP and GA3P, as GA3P has a carbonyl group and lacks a hydroxyl 


Table 1 | Activities of wild type and mutants of ST0318 


Enzyme ke (s~*) Km (mM) Keat/Km (St mM~+) 
FBPase 

Wild type 0.62 + 0.02 0.027 + 0.003 23 
Y229F 0.66 + 0.02 0.027 + 0.003 25 
Y348F 0.26+0.01 0.036 + 0.002 7.2 
FBPA (anabolic direction) 

Wild type 0.91 + 0.04 0.19 + 0.02 47 
Y229F ND* - - 
Y348F 0.10+0.01 0.34 + 0.04 0.29 
FBPA (catabolic direction) 

Wild type 0.027 + 0.011+ - - 
Y229F ND* - - 
Y348F 0.026 + 0.004+ - - 


* ND, not detected. 
+ Enzyme activity was measured with 5mM FBP. 
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Figure 1 | Active sites of ST0318 in the two forms. a, b, DHAP-Schiff base 
complex (the FBPA form) (a) and FBP complex (the FBPase form, PDB code 
1UMG)’ (b). DHAP and FBP are coloured green and cyan, respectively. The 
neighbouring protomer is coloured dark grey, and its residues are labelled with 
prime symbols. The lid, Schiff-base and C-terminal loops are coloured blue, 


group at the C3 position. The enzyme displayed the activity under the 
crystallization conditions (albeit approximately 50% compared with 
that under standard conditions), and the dissolved crystals exhibited 
activity comparable to that of the purified enzyme, confirming that the 
FBPA/P-DHAP complex structure determined here represents a 
genuine reaction intermediate. The DHAP molecule is located at a 
position similar to that of the FBP molecule in the FBPase form, 
demonstrating that FBPA/P catalyses both the FBPA and FBPase reac- 
tions at a single site. 

A structural comparison of the FBPA and FBPase forms reveals 
striking differences in three loop regions at the active site: a lid loop 
(residues 97-110), a Schiff-base loop (residues 219-233) that contains 
the Schiff base-forming Lys232 residue, and a carboxy (C)-terminal 
loop (residues 346-361) (Fig. 1). Although we could confidently trace 
the polypeptide chain, except for part of the C-terminal loop (residues 
354-360), the loop regions exhibit relatively higher average B-factors 
than that for the overall polypeptide (Supplementary Table 2), indi- 
cating their flexible nature. In the FBPase form, the lid and C-terminal 
loops interact with FBP, whereas the Schiff-base loop is further away 
from the active site and does not contact FBP’ (Fig. 1b and 
Supplementary Fig. 3b). In the FBPA form, the lid and C-terminal 
loops are displaced outwards from the active site and do not contact 
DHAP, whereas the Schiff-base loop cuts into the active site, allowing 
Tyr229 and Lys232 to interact with DHAP (Fig. la and Sup- 
plementary Fig. 3a). Although Asp 234 similarly holds Mg2 in the 
two forms, Asp 233, which holds the catalytically essential Mgl in 
the FBPase form’, is flipped out in the FBPA form, resulting in the 
dissociation of Mgl. 

A structural comparison of ST0318 in the two forms with rmFBPA 
in the DHAP- and FBP-Schiff base intermediate states®’ reveals an 
unexpected similarity in the active-site configurations between FBPA/ 
P and class I FBPA (Fig. 2a and Supplementary Fig. 7a). Notably, the 
covalent DHAP intermediates of ST0318 and rmFBPA superimpose 
well, even though the Ca atoms of their Schiff base-forming lysine 
residues are far apart, owing to their distinct overall structures 
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yellow and pink, respectively. In (a), Mg”* ions (Mg2-4) and a water molecule 
are shown as green and red spheres, respectively. Hydrogen bonds are shown as 
black dashed lines. In (b), Mg”* ions (Mgl and Mg2-4) are shown as magenta 
and cyan spheres, respectively. The disordered region is indicated by a yellow 
dashed line. 


(Supplementary Fig. 8). Based on this analogy, we propose a mech- 
anism for FBPA/P-catalysed reactions (Fig. 3). The aldol condensation 
reaction consists of dehydration, carbanion formation, C3-C4 bond 
formation and hydration steps, in which general acid/base catalysts 
participate. In rmFBPA, Glu 187 reportedly functions as the general 
acid/base residue in multiple steps’. In addition, Asp 33 (ref. 8) or 
Tyr 363 (ref. 6) may be involved in the second carbanion formation 
step, by accepting a proton from the C3 methylene of the DHAP- 
Schiff base intermediate. Tyr 229 and Asp 287 in ST0318 are located 
at positions similar to those of Glu 187 and Asp 33 in rmFBPA (Fig. 2a 
and Supplementary Fig. 7a). Tyr 229 is located in the vicinity of the 
Lys 232 NC (3. 2A), the DHAP C2 (3. 0A) and C3 (3. 3 A), and the 
superposed FBP O4 (2. 6A) atoms, whereas Asp 287 is further away 
from the DHAP C2 (4.9A) and C3 (3.9A) atoms (Fig. 2b and 
Supplementary Fig. 7b). Moreover, Tyr 229 is completely conserved 
in the FBPA/P proteins, whereas Asp 287 is replaced with a cysteine 
residue in the Thermus thermophilus and Cenarchaeum symbiosum 
FBPA/Ps (Supplementary Fig. 1). Indeed, the Y229F mutant retained 
the FBPase activity, but lost the FBPA activity (Table 1). These obser- 
vations suggest that Tyr 229 serves as the catalytic acid/base residue for 
all steps (Fig. 3). In contrast to the FBPase form, the active site is 
accessible to the solvent in the FBPA form, and thus seems suitable 
for the hydration reaction. A comparison of the high-resolution struc- 
tures of the two forms provides insight into the GA3P recognition 
mechanism. In the FBPA form, two water molecules form hydrogen 
bonds with Tyr 229 and Asp 287, and they are located at positions 
equivalent to the FBP 4- and 5-hydroxyl groups in the FBPase form, 
respectively (Fig. 2b and Supplementary Fig. 7b). In addition, the 
residues involved in the recognition of the FBP 6-phosphate group 
(His 19, Tyr 91, Gln 242’ and His 243’) are located at similar positions 
in the two forms. Thus, it is likely that the GA3P phosphate group is 
recognized by His 19, Tyr 91, Gln 242’ and His 243’, and the GA3P 
carbonyl and hydroxyl groups are recognized by Tyr 229 and Asp 287, 
respectively. The nucleophilic water molecule for the FBPase reaction 
is already bound to the FBPA form, and is held by Mg2 and Mg3 
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Figure 2 | Catalytic components of FBPA. a, Superimposition of the FBPA 
(green) and FBPase forms (cyan, PDB code 1UMG) of ST0318, and the DHAP- 
Schiff base (enamine) intermediate (black, PDB code 2QUT) and the FBP- 
Schiff base intermediate (pericyclic transition state) of wild-type rmFBPA 
(white, PDB code 1ZAI), and Tyr 363 of the rmFBPA K146M mutant (semi- 
transparent grey, PDB code 2QUU). The residues of STO318 and rmFBPA are 
labelled with green and black characters, respectively. The Mg”* ion (Mg1) in 
the FBPase form is shown as a magenta sphere. b, Superimposition of the FBPA 


(Supplementary Fig. 6). In addition, the catalytic base for the FBPase 
reaction, Asp 12, is located at a similar position in both forms. 
However, DHAP is protected from dephosphorylation, probably 
because of the lack of Mgl, which is required for stabilizing the 
developing negative charge after phosphate cleavage in the FBPase 
reaction’. Hydrolysis of the Schiff-base releases the loop containing 
Lys 232 from the active site, thereby enabling the enzyme to bind Mgl. 
In addition, the closure of the lid and C-terminal loops would be 
important for the FBPase reaction, as they stabilize FBP binding. 
Gly 104 and Asn 105 on the lid loop interact with the FBP 6- and 
1-phosphate groups, respectively, whereas Tyr 348 on the C-terminal 
loop interacts with the FBP 4-hydroxyl and 6-phosphate groups’. 


H243' 


(green) and FBPase forms (semi-transparent cyan, PDB code 1UMG) of 
ST0318. Water molecules in the FBPA form are shown as red spheres. The 
interactions in the FBPA and FBPase forms are depicted by green and cyan 
dashed lines, respectively, and the distances (in angstroms) between the protein 
atoms (Tyr 229 or Asp 287) and the DHAP-Schiff base are shown (green). The 
distances (in angstréms) between Asp 287 and the DHAP C2 and C3 atoms and 
between Tyr 229 and the FBP 04 atom are also shown with black dashed lines. 


In summary, the crystal structures of FBPA/P in the two forms 
revealed that FBPA/P achieves its bifunctionality by transforming its 
active-site architecture, through the toggle switch-like motions of the 
three mobile loop regions. Conformational fluctuations, for example 
loop movements and domain motions, are essential for substrate bind- 
ing and product release in enzyme functions” “’. A recent study showed 
that a His/Trp biosynthesis isomerase, PriA, exhibits bisubstrate spe- 
cificity through a substrate-induced metamorphosis of the active-site 
architecture!*. However, the role of conformational fluctuations in 
enzyme catalysis and bifunctionality has been less clear. To our 
knowledge, this study is the first to elucidate the molecular mechanism 
by which an enzyme catalyses multiple chemical reactions at a single 
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Figure 3 | Proposed mechanism of the FBPA and FBPase reactions catalysed by ST0318. The crystal structure determined here represents the imine (iminium) 


intermediate. 
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site. Furthermore, our findings suggest the possible existence of undis- 
covered enzymes that also catalyse multiple chemical reactions at a 
single site. 


METHODS SUMMARY 


The wild-type and mutant ST0318 proteins were expressed in Escherichia coli and 
purified to homogeneity, as described previously’. Crystallization was performed 
at 25 °C, using the sitting-drop vapour-diffusion method. Crystals were obtained 
by mixing 1 yl of protein solution, consisting of 14mg ml ' ST0318, 20 mM Tris- 
HCl (pH 7.5), 5mM DHAP and 5mM MgCh, and 1 ul of reservoir solution, 
consisting of 0.1M Bicine-KOH (pH 9.0), 10% PEG 20,000 and 2% dioxane. 
X-ray diffraction data were collected at the NW12A station (2 = 1.0 A) at the 
Photon Factory AR, High Energy Accelerator Research Organization, Tsukuba, 
Japan. The crystal structure of ST0318 in the FBPase form (PDB code 1UMG) was 
used as the initial model for refinement. Data collection and refinement statistics 
are provided in Supplementary Table 1. Site-directed mutagenesis was performed 
with a PrimeSTAR mutagenesis kit (Takara Bio). The FBPA and FBPase activities 
were both measured at 48°C using a coupled spectrophotometric assay, as 
described previously’. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Protein preparation and crystallography. Wild-type and mutant ST0318 
proteins were expressed in E. coli and purified to homogeneity, as described previ- 
ously*. Crystallization was performed at 25°C, using the sitting-drop vapour- 
diffusion method. Crystals were obtained by mixing 1 l of protein solution, 
consisting of 14 mgml ' ST0318, 20mM Tris-HCl (pH 7.5), 5mM DHAP and 
5mM MgCl, and 1 pl of reservoir solution, consisting of 0.1 M Bicine-KOH (pH 
9.0), 10% PEG 20,000 and 2% dioxane. X-ray diffraction data were collected at the 
NW12A station (7 = 1.0 A) at the Photon Factory AR, High Energy Accelerator 
Research Organization, Tsukuba, Japan. Crystals were cryoprotected in the res- 
ervoir solution supplemented with 25% 2-methyl-2,4-pentanediol, and were flash- 
cooled at 100 K in a stream of nitrogen gas. Data were processed using HKL2000 
(ref. 13). The previously determined ST0318 structure (PDB code 1UMG) was used 
as the initial model for refinement. Manual model rebuilding and refinement were 
performed using Coot'* and RefmacS (ref. 15). The final model contains residues 
2-353 and 361-364, 264 water molecules, one 2-methyl-2,4-pentanediol, three 
Mg’* ions and one DHAP molecule (02 atom dehydrated) covalently attached 
to Lys 232. Data collection and refinement statistics are provided in Supplementary 
Table 1. Molecular graphic images were prepared using PyMol (Delano Scientific). 
Site-directed mutagenesis and enzyme assay. Site-directed mutagenesis was 
performed with a PrimeSTAR mutagenesis kit (Takara Bio). The FBPA and 
FBPase activities were both measured at 48 °C using a coupled spectrophotometric 
assay, as described’. 

FBPase assay. FBP-dependent fructose-6-phosphate formation was measured 
by coupling the reaction with exogenous phosphoglucose isomerase and glucose- 
6-phosphate dehydrogenase, and NADPH formation was monitored at 340 nm 
(€349nm NADPH = 6,300 M 'cm7!). The assay mixture (0.5 ml) consisted of 
0.1M Tris-HCl (pH 7.8), 20mM MgCl, 20mM dithiothreitol (DTT), 0.5mM 
NADP, 0.01-0.3mM FBP and 1U each of phosphoglucose isomerase and 
glucose-6-phosphate dehydrogenase from baker’s yeast (Sigma-Aldrich). The 
reaction was started by the addition of the purified enzyme. 

FBPA assay (anabolic direction). Triosephosphate-dependent fructose-6- 
phosphate formation was measured by coupling the reaction with exogenous 


triosephosphate isomerase, phosphoglucose isomerase and glucose-6-phosphate 
dehydrogenase from baker’s yeast (Sigma-Aldrich); NADPH formation was 
monitored at 340nm. The assay mixture (0.5 ml) consisted of 0.1M Tricine- 
KOH (pH 8.0), 20mM MgCh, 20mM DTT, 0.5mM NADP*, 4U of trio- 
sephosphate isomerase, 1 U of phosphoglucose isomerase and 1U of glucose- 
6-phosphate dehydrogenase. The assay mixture was preincubated for 4 min. 
After the addition of GA3P (0.04-0.35 mM), the assay mixture was further incu- 
bated for 1 min to achieve equilibrium between GA3P and DHAP. The reaction 
was started by the addition of the purified enzyme. 

FBPA assay (catabolic direction). FBP-dependent formation of triosepho- 
sphates was measured by coupling the reaction with triosephosphate isomerase 
from baker’s yeast and glycerolphosphate dehydrogenase from rabbit (Sigma- 
Aldrich), and the oxidation of NADH was monitored at 365nm (é365nm 
NADH = 3,400M 'cm7!). The assay mixture (0.5ml) consisted of 0.1M 
Tricine-KOH (pH 8.0), 20mM MgCh, 20mM DTT, 0.55mM NADH, 5mM 
FBP, 20 U of triosephosphate isomerase and 2 U of glycerolphosphate dehydro- 
genase. The reaction was started by the addition of the purified enzyme. 

To examine whether the enzyme is active under the crystallization conditions, 
we measured the FBPA activity (anabolic direction) in an assay mixture (0.5 ml) 
consisting of 0.1 M Bicine-KOH (pH 9.0), 10% PEG 20,000, 2% dioxane, 5 mM 
MgCl, 0.5 mM NADP*, 0.5 mM GA3P, 0.5mM DHAP, 20 U of triosephosphate 
isomerase, 5 U of phosphoglucose isomerase and 5U of glucose-6-phosphate 
dehydrogenase. To examine whether the FBPA/P-DHAP complex represents a 
genuine reaction intermediate, we washed several crystals using the crystallization 
buffer, dissolved them in water and then measured the FBPA activity (anabolic 
direction). 
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oscillation mode. Methods Enzymol. 276, 307-326 (1997). 
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structures by the maximum-likelihood method. Acta Crystallogr. D 53, 240-255 
(1997). 


©2011 Macmillan Publishers Limited. All rights reserved 


| sd il Be 


doi:10.1038/nature10503 


Saccharomyces cerevisiae THI4p is a suicide 
thiamine thiazole synthase 


Abhishek Chatterjee}, N. Dinuka Abeydeera*, Shridhar Bale'}, Pei-Jing Pai’, 
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Thiamine pyrophosphate 1 is an essential cofactor in all living 
systems’. Its biosynthesis involves the separate syntheses of the 
pyrimidine 2 and thiazole 3 precursors, which are then coupled’. 
Two biosynthetic routes to the thiamine thiazole have been iden- 
tified. In prokaryotes, five enzymes act on three substrates to pro- 
duce the thiazole via a complex oxidative condensation reaction, 
the mechanistic details of which are now well established”*. In 
contrast, only one gene product is involved in thiazole biosynthesis 
in eukaryotes (THI4p in Saccharomyces cerevisiae)’. Here we 
report the preparation of fully active recombinant wild-type 
THI4p, the identification of an iron-dependent sulphide transfer 
reaction from a conserved cysteine residue of the protein to a 
reaction intermediate and the demonstration that THI4p is a 
suicide enzyme undergoing only a single turnover. 

While analysing the metabolic pathway leading to thiamine pyro- 
phosphate 1 (Fig. la), we identified three adenylated metabolites 
(structures 5, 12 and 17 in Fig. 1b), co-purifying with THI4p. They 
provide three molecular snapshots of the reaction pathway catalysed 
by this protein. In addition, two partially active mutants were iden- 
tified (C204A and H200N), which catalysed the conversion of NAD 
(nicotinamide adenine dinucleotide) 6 and glycine 9 to an advanced 
intermediate 12 (ref. 8). A mechanism for thiazole formation, consist- 
ent with these observations, is outlined in Fig. 1b (refs 8-11). However, 
the source of the thiazole sulphur remained elusive, precluding us from 
deciphering the subsequent steps leading to the adenylated thiazole 5. 

High-resolution ESI-FTMS (electrospray ionization Fourier- 
transformed mass spectrometry) analysis of wild-type THI4p (wtTHI4p), 
recombinantly expressed in Escherichia coli, revealed a mass that was 
34 + 1 Da lower than the calculated mass of the protein. The active site 
mutants of THI4p’, which did not copurify with any bound metabolites 
and did not show any activity, were unmodified, indicating that the 34- 
Da mass loss was in some way related to the catalytic activity of the 
protein. To localize the site of this modification, chymotrypsin diges- 
tion of modified wtTHI4p was carried out, followed by MALDI 
(matrix-assisted laser desorption ionization) and ESI-MS analysis of 
the peptide fragments. Before the digestion, free thiol residues of the 
protein were alkylated with iodoacetamide to protect them from oxida- 
tion. As a control, we performed the same procedure on an inactive and 
unmodified THI4p mutant (R301Q)° in parallel. Upon comparing the 
results for the wtTHI4p and the mutant THI4p, two modified peptide 
fragments, spanning the same region of the protein sequence, were 
identified (Fig. 2a and Supplementary Fig. 2). Fragmentation analysis 
localized the modification to a pair of adjacent cysteine residues 
(Cys 204 and Cys 205, Fig. 2a, highlighted in red). Both of these residues 
failed to alkylate during the iodoacetamide treatment of the peptide 
fragments. In contrast, peptide fragments originating from THI4p 
(R301Q) under the same conditions were completely alkylated 
(Fig. 2a). These observations may be explained by the transfer of 
H2S from Cys 204 or Cys 205 of wtTHI4p to a thiazole intermediate, 


Pieter C. Dorrestein®, David H. Russell’, 


generating a dehydroalanine residue (M — 34Da), which is subse- 
quently trapped by the adjacent cysteine-thiol, producing a seven- 
membered cyclic thioether (Fig. 2b and Supplementary Fig. 2). 

On the basis of these observations, the crystal structures of THI4p’ 
and THI (ref. 12; THI4p orthologue from Arabidopsis thaliana) were 
reanalyzed for evidence of an active site dehydroalanine (DHA) or 
cyclic thioether. The 2F,— F. and F, — F, electron density maps 
clearly demonstrate a lack of electron density for the sulphur atom 
of Cys 205, consistent with a dehydroalanine residue (Fig. 2c, d). The 
loop containing Ala 199-Asp 207, disordered in our previous struc- 
ture, was completed and shown to extend into the active site of a 
fourfold-related monomer. The interpretation of the high resolution 
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Figure 1 | Thiamine pyrophosphate and thiamine thiazole biosynthesis. 
a, The late steps in thiamine pyrophosphate biosynthesis. b, Mechanistic 
proposal for the biosynthesis of thiamine-thiazole in eukaryotes catalysed by 
THI4p. 
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Figure 2 | Identification of the site of the M — 34Da modification in 
wtTHI4p. a, The peptide fragments originating from wtTHI4p and the R301Q 
mutant containing the site of modification. WT, wild type. b, A mechanistic 
hypothesis to explain the mass loss and lack of reactivity with iodoacetamide of 
the modified peptides. c, Active site of THI4p with bound ADT. The separation 
of the sulphur atom of ADT and the Cg atom of the dehydroalanine residue is 


structure of THI1 (PDB ID 1RP0) is also consistent with an active site 
dehydroalanine. The 2F, — F. and F, — F, electron density maps for 
this protein confirm that electron density for the sulphur atom of 
Cys 172 is missing and demonstrate that the C, is planar (Supplemen- 
tary Fig. 3). The identification of adjacent cysteine/dehydroalanine 
residues in THI4p suggests that the formation of the cyclic thioether 
occurs under the denaturing conditions used for the preparation of the 
sample for mass spectrometry analysis. 

The activity-dependent loss of H2S from THI4p indicated that the 
sulphur atom of the thiazole could be derived from THI4p. The result- 
ing modification renders the enzyme inactive and explains our pre- 
vious inability to reconstitute active wtTHI4p. The problem of 
obtaining active wtTHI4p was solved by a surprising set of observa- 
tions regarding the effect of the growth medium on the activity of 
isolated wtTHI4p. When the overexpression strain was grown in M9 
minimal medium instead of Luria-Bertani (LB) medium, the purified 
protein was mostly free of the 34-Da mass modification (Fig. 3b). 
High-performance liquid chromatography (HPLC) analysis of this 
protein preparation demonstrated greatly reduced formation of the 
sulphur-containing metabolites 17 and 5 and the accumulation of 12 
(Fig. 3a, red HPLC trace). This indicates that the sulphur transfer 
chemistry involved in thiazole formation is greatly retarded when 
wtTHI4p is isolated from cells grown in minimal medium. 

Addition of 100 [IM iron (II) to the growth medium was sufficient to 
restore the sulphide transfer activity of wtTHI4p as indicated by the 
formation of 17 and 5 shown in Fig. 3a (black HPLC trace). In addi- 
tion, the protein thus isolated showed the 34-Da mass loss, as observed 
with wtTHI4p overexpressed in LB medium (Fig. 3b). These observa- 
tions indicated that the THI4p-catalysed sulphur incorporation re- 
actions are iron-dependent. 


5.3 A. The loop Gln 203-Pro 208 is from a fourfold-related monomer and has 
carbon atoms coloured cyan. Water molecules are shown as red spheres. The 
electron density map (2F, — F- contoured at 3c) clearly shows the loss of 
sulphur from Cys 205 to form the dehydroalanine residue. d, Magnified 
electron density of residue DHA 205 and residue Cys 204. 


These observations suggested conditions for the successful recon- 
stitution of the THI4p-catalysed reaction. To accomplish this, the 
unmodified wtTHI4p, overexpressed in minimal media, was freed from 
bound metabolites by multiple rounds of gel-filtration. This protein 
preparation catalysed the conversion of ADP-ribose (ADPr 7) to 12 in 
the presence of glycine 9, via the intermediate of ADP-ribulose (ADPrl 
8, Fig. 3c). Addition of Fe‘? to this reaction mixture resulted in the 
conversion of 12 to the final product ADP-thiazole (ADT, 5; Fig. 3c). 
Other divalent metal ions (Mg*?, Ca‘?, Mn*?, Co*?, Ni*?, Cu’? and 
Zn") did not activate the enzyme for sulphide transfer chemistry. 
Although this reaction mixture did not contain an exogenous sulphur 
source, nearly one full turnover could be observed (380 + 5 uM ADT 
from 420 + 21 uM THI4p). Inclusion of excess sulphide or cysteine in 
the reaction mixture did not enhance the turnover number or protect 
the protein from modification. The production of ADT 5 was oxygen- 
sensitive and had to be performed under an anaerobic atmosphere, 
in the presence of the reducing agent dithiothreitol (DTT) or tris- 
(2-carboxyethyl)-phosphine (TCEP). Formation of ADT was accom- 
panied by the loss of HS (AM = 34 Da) from wtTHI4p, as evidenced 
from ESI-FTMS analysis (Fig. 3e). In control reactions, lacking either 
Fe’? or ADPr 7, no modification of wtTHI4p was observed. A time 
course for the reaction demonstrates a consistent stoichiometry of 1:1 
for protein modification and thiazole production (Fig. 3d, e). 

The full reconstitution of ADT 5 formation, using purified 
wtTHI4p, ina reaction mixture lacking any exogenous sulphide donor, 
coupled with the observed loss of 34Da from the protein and the 
structural characterization of a dehydroalanine residue at Cys 205 
provides compelling evidence that the thiazole sulphur is derived from 
Cys 205 of wt THI4p. Consistent with this model is the observation that 
Cys 205 is strictly conserved in eukaryotic thiazole synthases, whereas 
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Figure 3 | Reconstitution of the biosynthesis of ADT 5. a, HPLC analysis of 
the metabolites associated with wtTHI4p overexpressed in M9 minimal 
medium + 100 1M iron. b, ESI-FTMS analysis of wtTHI4p overexpressed in 
M9 minimal medium + 100 UM iron shows iron-dependent modification 
(AM = —34 Da) of the protein. c, HPLC analysis of wt THI4p-catalysed partial 
and full reactions and the relevant control reactions. Incubating THI4p with 


Cys 204 is often replaced with a serine residue. Also, the THI4p C2058 
mutant was shown to be inactive, whereas the C204S mutant retains its 
ability to produce ADT, indicating Cys 205 as the sulphur donating 
residue. 

Efforts towards the in vitro reactivation of modified THI4p under a 
variety of different conditions were unsuccessful. This suggests that 
THI4p is a single turnover enzyme. In addition, whereas thiamine 
biosynthetic enzymes are generally present in the proteome at very 
low concentrations, THI4p is an exception and a high level of expres- 
sion (approximately 1.5%)’* is observed during the exponential growth 
phase of Neurospora crassa. To investigate the possibility that THI4p is 
a single turnover enzyme, we first characterized THI4p, expressed at 
native levels in a yeast strain, in which a His, tag had been inserted at 
the carboxy terminus of THI4p to facilitate its detection and purifica- 
tion. THI4p was isolated from this strain grown to mid-log phase in a 
vitamin-free defined medium using immunoaffinity chromatography 
and was analysed by gel electrophoresis and western blotting (Fig. 4a, 
b). The THI4p band was excised, digested with chymotrypsin and the 
peptide fragments were subjected to mass spectrometric analysis 
(MALDI-TOF/TOF MS). The peptide fragment ('T194 to Y216) asso- 
ciated with the 34 Da mass-loss (m/z = 2,430.1) was again identified 
in this experiment (Fig. 4c) and their fragmentation patterns were 
identical, confirming the loss of H2S followed by cyclic thioether 
formation (Supplementary Figs 4 and 5). The unmodified species for 
the same peptide (predicted m/z = 2,464 Da) was not observed in this 
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ADPr7, glycine and iron (II) results in the production of ADT 5. d, Time course 
for the reaction showing a 1:1 ratio of protein modification and thiazole 
production (error bars indicate s.d.). e, Mass spectrometry analysis of wtTHI4p 
over the time course of the reaction showing the progressive conversion of the 
enzyme to the M — 34 Da species. 


experiment. Identification of the same pattern of H,S-loss from native 
THI4p, as observed in the in vitro reconstitution experiments, further 
validates THI4p as the sulphur donor for thiazole biosynthesis in vivo. 
In addition, the mass spectrometric analysis identifies modified THI4p 
as the major species present in vivo, indicating the absence of a repair 
pathway in yeast for modified THI4p. This suggests that the ratio of 
THI4p to thiamine produced should be 1:1. To test this hypothesis, the 
concentration of native THI4p in yeast cell-free extract was deter- 
mined by quantitative western blot analysis, using known quantities 
of recombinantly expressed THI4p to generate a calibration curve 
(Fig. 4a), and thiamine concentration in yeast lysate and in the growth 
medium (secreted by yeast) was determined by the thiochrome assay 
(supplementary Fig. 6). A 1:1.1 + 0.2 stoichiometry between THI4p 
and thiamine was demonstrated in yeast cultures growing at mid-log 
phase (four independent experiments), supporting its role as a single 
turn-over enzyme (Fig. 4d). 

A mechanistic proposal for the sulphide transfer chemistry involved 
in ADT 5 formation that is consistent with these observations is out- 
lined in Fig. 4e. Addition of the thiol of Cys 205 to intermediate 13, 
formed as shown in Fig. 1, followed by Fe ** assisted elimination would 
give 14 and generate the active site dehydroalanine observed in the 
structure. It is also possible that the iron activates the sulphide transfer 
by direct interaction with the sulphur. The oxygen sensitivity of the 
reaction indicates that Fe** is the catalytically active iron oxidation 
state. Intermediate 14 is then converted to ADT 5 as shown in Fig. 1. 
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Figure 4 | Characterization of native THI4p from S. cerevisiae. a, Analysis 
of THI4p in yeast cell-free extract using a western blot (right) and Coomassie 
blue (left). Lanes 1-5 contain increasing concentrations (0.5, 1, 2, 4 and 8 uM) 
of Hiss-THI4p. Lane 6 contains yeast crude lysate. b, Native THI4p, isolated 
from yeast cell-free extract, analysed by SDS-PAGE/Coomassie blue (1) and 
western blot (2). c, In-gel chymotrypsin digestion/MALDI-TOF analysis of 
isolated native THI4p demonstrates that the peptide containing the C204- 
C205 region has the same modification as observed with THI4p expressed in E. 
coli. d, Quantification of THI4p and thiamine produced in a culture of yeast, 
grown in vitamin-free defined medium, demonstrates that THI4p is a substrate 
rather than a catalyst. e, Proposed mechanism for the iron-mediated sulphur 
transfer reaction involved in the formation of intermediate 14 (Fig. 1). 


The observations reported here strongly indicate that THI4p acts as a 
co-substrate rather than an enzyme. This is very unusual but is not 
without precedent. The best characterized example of a single turnover 
enzyme is the Ada protein which repairs O°-methylguanine and methyl- 
phosphotriester lesions in DNA by transferring the methyl group to an 
active site cysteine. The resulting inactive enzyme serves as a signal to 
induce other DNA repair enzymes'*"*. The possibility that inactive 
THI4p has other physiological function(s) remains to be explored. 
Interestingly, involvement of THI4p and its orthologues has been impli- 
cated in DNA protection and other stress-related pathways’? ”. The 
mechanism of this protection is not known. One possibility is that the 
abundant THI4p protects the cells by binding free cellular iron, which is 
known to cause oxidative damage via the generation of reactive oxygen 
species. 

THI4p is a remarkable protein: it is a suicide enzyme serving as a co- 
substrate rather than an enzyme for the formation of the thiazole 
moiety of thiamine. This assembly involves a complex, unprecedented 
reaction sequence in which NAD serves as the source of the five-carbon 
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chain and THI4p serves as the sulphur source. The biological function 
of the modified protein, if any, remains to be elucidated. 


METHODS SUMMARY 


THI4p and its mutants (pET28b vector, BL21(DE3) cell line) were overexpressed 
in LB medium and isolated using Ni-NTA chromatography as described previ- 
ously*. To isolate unmodified THI4p for activity assays, M9 minimal medium was 
used for overexpression (see Supplementary Materials for details). HPLC analyses 
of THI4p-bound metabolites and the in vitro reconstitution assays were per- 
formed as previously described®. Assays of THI4p were performed in an anaerobic 
chamber (COY Laboratory Products Inc.). A typical THI4p reconstitution re- 
action assay included ADP-ribose or NAD (final concentration 1 mM), glycine 
(final concentration 1 mM), freshly prepared FeSO, (final concentration 0.5 mM) 
and THI4p (300-500 1M). THI4p and its mutants were desalted into 50% 
methanol containing 0.1% formic acid for ESI-FTMS analyses. Chymotrypsin 
digestion/MALDI analyses of recombinant THI4p and its mutants were performed 
at the Proteomics Facility, Cornell University, using standard protocols. In-gel 
chymotrypsin digestion/ MALDI-TOF/TOF MS analysis with THI4p isolated from 
yeast was carried out in the Russell laboratory at Texas A&M University and the ESI- 
FTMS analysis was carried out in the Dorrestein laboratory at UCSD. The crystal 
structures of THI4p (Protein Data Bank ID 2GJC)’ and THI1p (PDB ID 1RPO)’* 
were refined against the deposited structure factor magnitudes using REFMAC from 
CCP4 (refs 20 and 21). The Saccharomyces cerevisiae strain BY4741 (purchased 
from Open Biosystems, Inc.) was used for in vivo studies with THI4p. Vitamin-free 
medium A (medium YNB (yeast nitrogen base) plus 0.01% uracil, 0.01% leucine, 
0.005% histidine, 0.005% methionine, 2% glucose, and 0.03% G418) was used to 
grow the yeast strain to allow the expression of the endogenous THI4p. Endogenous 
THI4p concentration was determined using a quantitative western blot, whereas the 
concentration of thiamine was determined using the thiochrome assay"°. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


All the chemicals were purchased from Sigma-Aldrich Corporation (USA) unless 
otherwise mentioned. LB medium was obtained from EMD Biosciences. Yeast 
nitrogen base (YNB) with amino acids and ammonium sulphate, vitamin-free, 
without dextrose (powder) was purchased from US Biological, Inc. Kanamycin 
and isopropyl-f-p-thiogalactoside (IPTG) were purchased from Lab Scientific 
Inc. Geneticin (G418 sulphate) was obtained from GIBCO-Invitrogen. 
Zymolyase-100T and chymotrypsin were obtained from AMS Biotechnology 
(Europe) Ltd and Promega, respectively. The Penta-His Alexa Fluor 647 antibody 
conjugate was purchased from Qiagen. The alkaline phosphatase, calf intestinal 
(CIP) was obtained from New England Biolabs Inc. Analytical HPLC was carried 
out using a Supelco LC-18-T (150 X 4.6 mm internal diameter; particle size 3 um) 
reverse-phase column. HPLC-grade solvents were obtained from Fisher Scientific. 
HPLC analyses were performed in an Agilent 1200 instrument equipped with an 
inline diode-array detector and a fluorescence detector. 

Protein expression and purification. M9 minimal medium was prepared by 
autoclaving a 21 solution of M9 minimal salts (22.5 g; Sigma-Aldrich) and upon 
cooling, complementing it with 25 ml of 50% sterilized glucose, 4ml of 1M 
sterilized MgSO,, 0.2 ml of sterilized 1M CaCl, and 40mg’ kanamycin. A 
10-ml culture of BL21(DE3) cells containing the plasmid used to express 
wtTHI4p was used to inoculate 21 M9 minimal medium prepared as described 
above. Cultures were agitated at 37 °C until the OD at 600 nm reached 0.6, at which 
point protein expression was induced by adding 1 mM IPTG. The temperature 
was reduced to 15°C and the culture was grown overnight with shaking. The 
cultures were then harvested by centrifugation (14,000g) and the pellets were 
frozen and stored at —80 °C until further use. To demonstrate iron dependence, 
an identical growth protocol was used except that, immediately before the induc- 
tion of protein expression, FeSO, was added to the culture (final concentration 
100 |tM). 

To purify the protein, a cell pellet from 21 of culture was resuspended in 30 ml 
lysis buffer (10 mM imidazole, 300mM NaCl, 50mM NaH,PO,, 1mM DTT, 
pH8) and lysed by sonication on ice (Heat Systems Ultrasonics model W-385 
sonicator, 2-s cycle, 50% duty). The resulting cell lysate was clarified by centrifu- 
gation (30,000g) and the THI4p protein was purified using a 5 ml HisTrap column 
(GE Healthcare) following the manufacturer's instructions. After elution, the 
protein was buffer-exchanged using a 10-DG column (BioRad) pre-equilibrated 
with 50 mM potassium phosphate buffer, pH 7.8, containing 2mM DTT. 

To release THI4p-bound metabolites, this protein preparation was further gel- 
filtered into 20 mM Tris-HCl pH 7.8 (containing 2 mM DTT). Then it was buffer 
exchanged into 50mM KPi, pH7.8 containing 2mM DTT (or TCEP) and 
100 mM NaCl. This protein preparation was used for all assays. 

ESI-FTMS analysis of THI4p and its mutants. Samples were prepared by desalt- 
ing the protein using C18 ZipTip (Millipore) following the manufacturer’s pro- 
tocol and eluting with 75% acetonitrile containing 0.1% formic acid. This was 
diluted 1:1 with 50% methanol containing 0.1% formic acid and the resulting 
solution was used for ESI-FTMS analysis (Thermo-Fisher LTQ-FT) using a 
Biversa nanospray robot (Advion). The resulting data were deconvoluted with 
the software package Extract (licensed by Thermo-Fisher). 

Chymotrypsin digestion-MALDI analysis of recombinantly expressed, 
wtTHI4p and the R301Q mutant. Guanidinium hydrochloride (40 ll, 8 M solu- 
tion in 25mM Tris-HCl, pH7.5) was added to a protein sample (10 pl, 10 mg 
ml '), followed by 2.5 ll of freshly prepared 200 mM DTT. After incubating the 
mixture for an hour at 50°C, 19.5 ul of freshly prepared 200 mM iodoacetamide 
solution was added to a final concentration of 55 mM and the reaction mixture was 
incubated for 1h at room temperature in the dark. An aliquot of this reaction 
mixture (36 jl) was added to a solution of 2.5 pg chymotrypsin in 150 il 100 mM 
Tris-HCl containing 10 mM CaCl). The proteolysis reaction was allowed to pro- 
ceed overnight at room temperature in the dark. The resulting peptides were 
desalted using a PrepSep C18 solid phase extraction cartridge (Fisher Scientific) 
according to the manufacturer’s protocol and were subjected to MALDI-TOF/ 
TOF MS analysis using a 4700 Proteomics Analyzer. 

In vitro reconstitution of wtTHI4p-catalysed reactions. The wtTHI4p, over- 
expressed in minimal media, was transferred to an anaerobic chamber (Coy 
Laboratory Products, Inc.). ADP-ribose or NAD (final concentration 1 mM), 
glycine (final concentration 1 mM) and freshly prepared FeSO, (final concentra- 
tion 0.5 mM) were added to the protein (420 (.M) to initiate the reaction. Control 
reactions were set up by omitting various essential components. Reactions were 
incubated at room temperature for 6 h, heat-quenched and analysed by HPLC as 
previously described*. The concentration of ADT produced was measured by 
converting it to the fluorescent thiochrome phosphate, as previously described’. 

To observe the time-dependence of the THI4p-mediated reaction, a large scale 
reaction was set up (3 ml). Aliquots (100 pil) were heat-quenched at appropriate 
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time intervals and the quenched reaction mixture was analysed by HPLC as 
previously described®. Simultaneously, another 100 ll aliquot was removed from 
the anaerobic chamber, rapidly buffer-exchanged, using a Bio-Spin mini desalting 
column (Bio-Rad), into 25mM Tris-HCl, pH7.5, frozen in liquid nitrogen and 
stored at —80°C. Later, these samples were individually thawed, immediately 
before use, and desalted and eluted with 75% acetonitrile containing 0.1% formic 
acid for ESI-FTMS analysis (as described above). 

Identification of dehydroalanine in the structures of THI4p and THIIp. 
Consistent with the biochemical studies, electron density maps showed no density 
for the sulphur atom of Cys 205 and this residue was changed to dehydroalanine. 
In addition, several surrounding residues were adjusted to fit the electron density 
and a loop missing from the original structure was built. New loops including 
Gln 198-Thr 202, His 237-Gly 242, Pro178-Thr 185 and Asp 238-Phe 24 and 
loop Asp 219-His 228 were rebuilt to better fit the electron density. Model building 
was done using program COOT”. Analysis of the Ramachandran plot for THI4p 
showed that 91.8% of the non-glycine residues were in the most favoured region, 
7.8% in the additional favoured region, 0.2% in the generously allowed region, and 
0.2% in the disallowed region. The final refinement statistics are given in 
Supplementary Table 1. 

The crystal structure of THI p (PDB ID 1RPO)’? was refined against the deposited 

structure factor magnitudes using REFMAC. The structure of THIlp was 
originally deposited with an annotation that the occupancy of the sulphur atom 
of Cys 172 was 0, indicating that electron density for the sulphur atom was not 
visible. Residue Cys 172 was changed to dehydroalanine and the structure was 
refined a few rounds to convergence. Analysis of the Ramachandran plot for 
THI1p showed that 92.9% of the non-glycine residues were in the most favoured 
region and the remaining 7.1% were in the additional favoured region. No residues 
were found in the generously allowed or disallowed regions. The final refinement 
statistics, which are very similar to those originally reported’’, are given in 
Supplementary Table 1. Figure 2c, d was made using PYMOL”. 
Yeast strain and cell growth. The Saccharomyces cerevisiae strain BY4741 (MATa 
ura3A0 leu2A0 his3A 1 met15A0) was purchased from Open Biosystems, Inc. The 
MATa cells of the background strain BY4741 are tagged at the 3’ end of the THI4 
ORF with a sequence encoding six histidines (6 X His)”*. The chromosomal inser- 
tion was verified by colony PCR analysis and G418 resistance. 

For each experiment, a single colony was picked from a freshly streaked G418 
plate that was incubated at 30 °C for 2 days. For cell growth, 10 ml of YPD medium 
(1% yeast extract, 2% bacto peptone, 2% glucose) was inoculated with this colony 
and incubated in a 100ml sterile Erlenmeyer flask at 30°C (shaking speed 
270r.p.m.) for 12h or until an attenuance of 4.0 at 600nm was reached. The 
resulting cells were collected by centrifugation, washed (3X) with vitamin-free 
medium A (medium YNB (yeast nitrogen base) plus 0.01% uracil, 0.01% leucine, 
0.005% histidine, 0.005% methionine, 2% glucose and 0.03% G418), suspended in 
50 ml of vitamin-free medium A containing the antibiotic G418 to an attenuance 
of 0.07 at 600 nm and transferred to a 250 ml sterile Erlenmeyer flask. This was 
incubated for 15h (the predicted mid-log phase) at 30°C (shaking speed 
270r.p.m.) and the cells were collected by centrifugation at 4,000g for 10 min. 

For experiments to determine the THI4p or thiamine concentration, cells (30- 
40 mg wet weight) were collected from 50 ml of vitamin-free medium A. 

For the mass spectrometric experiments, the cells (5-10 g, wet weight) were 

collected from 61 (3 X 21) of vitamin-free medium A. 
Western blot analysis. Yeast cell-free extract was analysed by SDS-PAGE, trans- 
ferred to polyvinylidene difluoride (PVDF) membranes using a Panther semi-dry 
electroblotter (Owl Separation Systems), and then blocked in TBS (Tris-buffered 
saline, 50mM Tris-HCl, pH7.4 and 150mM NaCl) plus 3% BSA and 0.05% 
Tween 20 for 1h, before incubating with Penta*His Alexa Fluor 647 (Qiagen) 
anti-His antibody conjugates (1:5,000 dilution) for 1h in the dark. After two 
rounds of 10-min washes with TBS in the dark, the membrane was then subjected 
to direct immunofluorescent detection by Typhoon Trio (excitation, 633 nm red 
laser; emission, 670nm band-pass filter (670 BP 30)) from GE Healthcare 
Biosciences. 

The above protocol was followed to confirm the occurrence and/or purity of 

endogenous THI4p isolated from yeast. 
Analysis of the THI4p/thiamin ratio in vivo. The yeast cultures were grown as 
described above. The cells grown to the mid-log phase in 50 ml of vitamin-free 
medium A were harvested by centrifugation at 4,000g for 10 min. A portion 
(20 ml) of the supernatant (the clarified culture medium) was collected and frozen 
at —80 °C for at least 12 h. This was then lyophilized to dryness and the residue was 
used to determine the amount of thiamine secreted into the growth medium 
(details will follow). 

The wet weight of cells from the 50 ml culture was 30 mg. Spheroplast formation 
was carried out according to published protocols”. Unless indicated otherwise, 
all steps were performed at 4°C. Cells (30mg) were re-suspended in 30 ul of 
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zymolyase buffer 1 (50mM Tris-HCl pH7.5, 10mM MgCl, 30mM TCEP), 
incubated for 15min at room temperature, centrifuged at 400g for 5 min and 
the supernatant was discarded (thiamine concentration in this fraction was neg- 
ligible). The cell pellet was re-suspended in 100 ll of zymolyase buffer 2 (50 mM 
Tris-HCl, pH 7.5, 10 mM MgCl, 1 mM TCEP) and the cell wall was enzymatically 
digested with 2mgml ' of zymolyase-100T (1.0 X 10° Ug '), which contained 
B-1,3-glucan laminarinpentaohydolase (from Arthrobacter sp.), B-1,3-glucanase 
(from Arthrobacter sp.), mannanase and protease. The suspension was incubated 
for 30 min at 30°C with shaking at 60 r.p.m. Light microscopy confirmed com- 
plete spheroplast formation. The spheroplasts were then centrifuged at 400g for 
5 min and the supernatant was collected in a 1.5 ml Eppendorf tube (Tube A). The 
spheroplasts were washed with 100 ul of ice-cold zymolyase buffer 2 followed by 
centrifugation at 400g for 5 min. and the supernatant was added to Tube A. The 
spheroplast pellet was resuspended in 100 tl of ice-cold lysis buffer (50 mM 
Tris-HCl, pH 7.5, 10 mM MgCl, 1mM TCEP, 10 mM KOAc, 1mM PMSF plus 
protease inhibitor cocktail (100mM AEBSF, 500mM_1,10-phenanthroline, 
2.2mM pepstatin A, and 1.4mM E-64)). The suspension was homogenized on 
ice in a Dounce homogenizer (1 ml) by 20 strokes with a large clearance pestle “A” 
(Kontes Glassware). The spheroplast lysate was transferred to Tube A. An equal 
volume (300 1l) of extraction buffer (lysis buffer plus 0.8 M (NH4)2SO, and 20% 
glycerol) was added to Tube A. The tube was agitated on a rocker for 30 min and 
centrifuged for 30 min at 30,000g. The supernatant (600 il) was then split into two 
equal portions. One portion (300 itl) was analysed for THI4p while the other was 
analysed for thiamine. 

To analyse for THI4p, one of the above portions was concentrated to100 pil and 
5 pl from that concentrate was loaded onto a 12% SDS-PAGE along with recom- 
binant THI4p standards (0.5, 1, 2, 4 and 8 uM). A standard curve was determined 
with each western blot. The SDS PAGE and quantitative western blot analysis were 
performed as described above. The THI4p band densities were analysed using the 
ImageQuant image analysis software provided with the Typhoon Trio. 

The second 300 pl portion of the cell lysate was treated with 5 1 of calf intestinal 
alkaline phosphatase (CIP) and incubated at 37 °C for 18 h to convert the thiamine 
vitamers to thiamine alcohol. Potassium acetate (50 tl of 4 M) was added to 100 ul 
of the alkaline-phosphatase-treated sample followed by oxidative cyclization to 
thiochrome using 50 pl of a saturated solution of K,Fe(CN).¢ in 7 M NaOH. The 
oxidation reaction was neutralized after 1 min with 6 M HCl and 100 ul from the 
reaction mixture was analysed by reverse phase HPLC with fluorescence detection 
(excitation at 365 nm, emission at 450 nm). The following linear gradient, at a flow 
rate of | ml min ', was used. Solvent A is water, solvent B is 100 mM KH,PO,, 
pH6.6, and solvent C is methanol: 0 min, 100% B; 2min, 10% A, 90% B; 
10 min, 25% A, 15% B, 60% C; 12 min, 25% A, 15% B, 60% C; 15 min, 100% B; 


17 min, 100% B. A calibration curve was generated using commercially available 
thiochrome following the same HPLC procedure. 

To determine the thiamine content of the lyophilized culture medium, it was 
re-dissolved in 1,000 pl of double distilled H,O. This solution (500 ul) was incu- 
bated at 37 °C for 18h with 5 mg of acid phosphatase and an aliquot (100 ul) was 
analysed for thiamine using the thiochrome assay described above. 

The THI4/thiamine ratio was determined in quadruplicate (Fig. 4d). 

MALDT analysis of native THI4p (ScTHI4p). The yeast cells from the large scale 
cultivation (61) were harvested by centrifugation, re-suspended in the appro- 
priate buffer and lysed by the zymolyase-homogenizer method as described above. 
THI4p was partially purified from the clarified protein lysate by immunomagnetic 
beads using Anti-His monoclonal antibody coated microbeads (uMACS, Miltenyi 
Biotec) following manufacturer protocol. The resulting proteins were further 
purified by SDS-PAGE (12%). Electrophoresis was performed at 110 V and a 
maximum of 15mA for 1.5h. The bands were stained with Coomassie blue. 
The protein band corresponding to THI4p (confirmed by western blot) was 
excised and digested with chymotrypsin at 37 °C overnight using the following 
protocol: the gel slice was washed with 25 mM ammonium bicarbonate (pH 8) and 
dehydrated with a mixture of acetonitrile (ACN) and 50mM ammonium 
bicarbonate (v/v, 2:1). The washing and dehydrating steps were repeated twice. 
The supernatant was then removed and the gel slice was dried in a vacuum 
centrifuge (SpeedVac Concentrator, Savant). Chymotrypsin (Promega) dissolved 
in 25 mM ammonium bicarbonate (10 pl of 20 ng pl solution) was added to the 
dried gel slice. After the gel slice was completely rehydrated, 20 ul of 25 mM 
ammonium bicarbonate was added to cover the gel slice and proteolysis occurred 
at 37 °C overnight. The chymotrypsin digested sample was acidified with formic 
acid (pH 2-3), and desalted using a C18 ZipTip pipette tip (Millipore) according to 
the manufacturer’s protocol. After desalting, the chymotrypsin digested sample 
was mixed 1:1 (v/v) with the MALDI matrix (5 mg ml! o-cyano-4-hydroxycin- 
namic acid, 50% (v/v) ACN, 10mM ammonium dihydrogen phosphate, 0.5% 
trifluoroacetic acid) and 1 of the resulting mixture was spotted onto a 
MALDI sample plate. MALDI-MS and MS/MS experiments were performed 
using a 4700 Proteomics Analyzer (Applied Biosystems). Collision-induced dis- 
sociation (CID) spectra were acquired using air as the collision gas (medium 
pressure setting) and at 1kV of collision energy in the Russell laboratory at 
Texas A&M University. 
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The picturesque French city of Grenoble attracts both mountain climbers and ambitious researchers. 


Peak of potential 


Once known mostly for its natural beauty, Grenoble is 
becoming a centre of innovation for academia and industry. 


BY KATHARINE SANDERSON 


spectacular mountains — the Chartreuse, 

Vercors and Belledonne ranges in the 
French Alps — Grenoble is more than a play- 
ground for adrenalin-seeking skiers. It is a 
university and research town that hosts 22,800 
research jobs, including 3,500 PhD students, 
11,800 posts in public research and 7,500 in 
private research. Grenoble is fast becoming an 
innovation hub. 

The city is home to one of France's longest- 
established science parks, and to multiple insti- 
tutions intent on fostering technology transfer. 
The local network of angel investors is one of 
the most successful in the country, and Gre- 
noble is second only to Paris in the number 
of French patent applications filed each year. 
The city has plans to expand its innovation 
and university links through large projects that 


eated beneath some of Europe’s most 


could bring in billions of euros in funding and 
create thousands of jobs. It also provides 
support for scientific start-ups. But there are 
big challenges — in particular the French 
academic environment, which many deem 
unfriendly to postdocs at best. 


INTERNATIONAL FLAIR 

There are 13 public research institutes in 
Grenoble. Four are international: a campus of 
the European Molecular Biology Laboratory 
(EMBL); the European Synchrotron Radiation 
Facility; the Institute Laue—Langevin (ILL), a 
neutron source; and the laboratories of the 
Institute for Millimetre-wavelength Radio- 
astronomy. The other institutes include the 
University of Grenoble, composed of multiple 
institutions, and French national research cen- 
tres such as the Atomic and Alternative Ener- 
gies Commission and the headquarters of the 
National Centre for Scientific Research, the 


country’s main research body. 

The construction continues. In July, the 
French government announced that the Greno- 
ble Innovation for Advanced New Technologies 
(GIANT) campus is to host a virtual nano- 
electronics research centre that will coordinate 
research projects at local institutes and compa- 
nies. It is expected to create 8,000 jobs, although 
the proportion that will be in research is not yet 
known. Some €460 million (US$630 million) 
will be invested over 10 years, half from govern- 
ment and half from private investors. 

“There is really a tradition of applied 
research and transfer to the real world here,” 
says Véronique Pequignat, head of inter- 
national business investment at the Grenoble 
region’s economic-development body, the 
Agency for Studies and Promotion of Isére 
(AEPI). In 2009, Grenoble accounted for 780 
of France’s 15,000 patent applications. The 
city’s contribution made up slightly more 
than one-third of the entire number of patents 
applied for in the surrounding Rhéne-Alpes 
region, which includes Lyons, the second- 
largest metropolitan area in France. 

The Inovallée, one of France's first science 
parks, opened in 1972 in Meylan, just outside 
Grenoble. It focuses on collaborations between 
academia and industry, and hosts 320 compa- 
nies. Grenoble also provides a supportive envi- 
ronment for ambitious and entrepreneurial 
young scientists. In the past 10 years, the area 
has seen the creation of 200 start-ups in fields 
ranging from the life sciences to fluid mechan- 
ics. Of those, 132 were supported by Grenoble 
Alps Incubation (GRAIN), which was founded 
in 1999 by a group of academic and govern- 
ment bodies. So far, it has created more than 
800 salaried research and other posts by sup- 
porting start-ups. 


INNOVATION STATION 
Among the start-ups funded by GRAIN is Ecrins 
Therapeutics, which was set up in July 2010 
by Andrei Popov, a physician-turned-cancer- 
biologist. His company has six employees — 
four scientists and two technicians — and is 
developing one anticancer product. It hopes to 
broaden its portfolio, says Popov, who is trying 
to raise €3 million to expand the business. So far, 
his team has received two French national inno- 
vation grants, worth €220,000 each. His com- 
pany is among several located at BIOPOLIS, a 
biotechnology hub run by Joseph Fourier Uni- 
versity (UJF), one of the University of Grenoble’s 
science and technology institutions. 

After being selected for investment by > 
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> GRAIN, Popov was given the opportunity to 
attend business seminars weekly for 18 months 
while his idea developed. GRAIN offers its 
start-ups a credit line (which Popov and his 
team did not fully use) of up to €45,000 to be 
paid off through investments or money earned 
from sales over three years, starting three years 
after the company has launched. The business 
incubator also provides a stipend of €1,150 per 
month for company creators who don't have 
a paid job. Popov still has access to a GRAIN 
adviser, who offers counselling on matters from 
team building to financing. 

Popov’s experience is part of a larger trend. 
Science and technology are “in the ecosystem 
here’, says Cheikhou Dieye, managing director 
of Grenoble Angels, a network of private inves- 
tors who support small companies in their ini- 
tial stages. It is among the most successful of 
the country’s 85 such groups. When Grenoble 
Angels first launched in 2005, the city’s tradition 
of established technology companies led the 
group to focus on semiconductor, nanotechnol- 
ogy and Internet companies, but biotechnology 
is now a big part of the portfolio; last year, four 
of the companies in which it invested were bio- 
tech start-ups. 


ACADEMIC AMBITIONS 
Academia is also benefiting from Grenoble's 
science expansion. The Grenoble-Alps Univer- 
sity of Innovation (GUI+), a super-university 
that will unite many of Grenoble’s existing 
research institutions, is part of the govern- 
ment’s Excellence Initiative (Idex) to augment 
university campuses. Funded by a €400-mil- 
lion grant, it will open as early as 2016. To hire 
more faculty mem- 
bers and staff, Yan- 
nick Vallée, head of 
the project and a UJF 
chemist, is seeking 
an extra €1 billion in 
Idex stimulus grants; 
he will find out in 
January 2012 whether 
he will get the funds. 
He estimates that 
the stimulus grant 


would pay about “There is really 
€40 million a year, @ tradition of 
which wouldfund20 appliedresearch 
research postsandan and transfer to 
as-yet-undetermined therealworld 
number of grants to here.” 


support PhD stu- Véronique Pequignat 
dents. Vallée would 

like to recruit researchers specializing in tech- 
nology, especially nanotechnology, electronics 
and computer science. 

Meanwhile, GIANT was selected for govern- 
ment funding in 2008 as one of 12 interna- 
tional campuses of excellence. It is being 
developed on a 250-hectare site that houses 
— and will house — some of the city’s large 
research projects. These include MINATEC, 


Grenoble hosts international institutes such as the European Synchrotron Radiation Facility (centre). 


a micro- and nanotechnology innovation 
campus that employs 2,400 researchers and 
files 300 patents a year. GIANT as a whole 
employs 6,000 researchers at the moment, and 
aims to grow to 10,000 researchers, 10,000 stu- 
dents and 10,000 industrial jobs by 2015. 
However, Grenoble scientists share some 
frustrations with other French researchers. 
“Finding a job is not easy. France creates a 
lot of postdoc researchers and there are rela- 
tively few permanent positions,” says Popov. 
Francois Briatte, a health-policy PhD student 
and a member of the Young Researchers’ Fed- 
eration (CJC) in Paris, says that academia is 
“in a dire state of contractual anarchy that 
constrains and obscures the job prospects of 
young researchers in France”. Briatte and the 
CJC assert that there are “serious, enduring 
shortages on the tenure-track academic mar- 
ket”. The nation’s academic institutions, says 
Briatte, don't have the budget from central gov- 
ernment to employ enough researchers. 


ROTATING POSITIONS 

But the EMBL and the ILL offer a different 
employment model. The EMBL follows the 
same system as its locations across Europe: the 
Grenoble campus employs around 90 scien- 
tists, each on initial five-year contracts that are 
renewed for two years at a time to a maximum 
of nine years. The best 10% are offered per- 
manent jobs when their contracts expire, says 
EMBL spokeswoman Sonia Furtado. 

The ILL also has about 90 scientists on five- 
year contracts, but they are not offered the 
chance to renew. Instead, every 18 months, 
young scientists who are two or three years 
into their contracts compete for a small num- 
ber of permanent positions at the centre. Over 
the past five years, between two and eight new 
scientists have been recruited each year. The 
arrangement, which is unique to the ILL’s Gre- 
noble location, required special permission 
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from the French labour council, says Elizabeth 
Moulin, head of recruitment and integration at 
the institute. In France, institutions are usually 
allowed to give postdocs only three-year con- 
tracts or permanent positions, she says. “The 
new people bring fresh blood and new ideas,” 
says Moulin. “In general, we tell our candidates 
that their chance to get a permanent position 
at the ILL is about one in three” 

Employment caveats might not be the only 
damper on Grenoble’s scientific ambitions 
— a national and local distaste for animal 
research could also have an impact. BIO- 
POLIS was built on the condition that it would 
not contain an animal house. “For oncology 
research we are talking about rats and mice, 
the workhorses of drug discovery,’ says Popov. 
“Yet we have no right even to bring a single 
living mouse inside — we have to use animal 
houses of the academic institutes nearby. That 
means that the technicians have to spend an 
hour a day walking between BIOPOLIS and 
the research? Concerns about the purported 
risks of nanotechnology have also prompted 
protests: in 2005, demonstrators occupied 
cranes involved in building MINATEC. Bri- 
atte says that industry in Grenoble should take 
these protests seriously and develop a social- 
mediation process. 

Nevertheless, the area’s entrepreneurial 
activity remains a big draw — as do its moun- 
tains. And rather than being a distraction, 
some argue that the natural beauty of the area 
encourages a culture of ambitious scientists. 
“People who want to scale mountains are often 
successful in other areas,’ says Imre Berger, a 
genome biologist at the EMBL. “These are the 
people you want in your lab. They play hard 
and they work hard,’ he says. “They are eager 
and able to scale a pinnacle.” m 


Katharine Sanderson is a freelance writer 
based in Toulouse, France. 


P. GINTER/SCIENCE FACTION/CORBIS 


AEPI 


TURNING POINT 


Josefina del Marmol 


Argentinian PhD student Josefina del 
Marmol, who studies biological sciences at 
the Rockefeller University in New York, once 
planned to become a classical pianist. But 
Bach took a back seat, and now del Marmol 
is focusing on biophysics and molecular 
neurobiology as one of 48 inaugural 
Howard Hughes Medical Institute (HHMI) 
international student research fellows. 


Do you miss music’s former role in your life? 
There's a piano at Rockefeller that everyone 
can use. It’s so great that it’s there, because I 
couldn't bring mine from home. I like to play 
all classical music. But now it’s just a hobby. 


What prompted you to pursue science? 

I wanted to bea musician, but in high school I 
took a biology class, and that changed every- 
thing. We studied evolution, and I couldn't 
stop reading and studying it. At college, I 
took up biology and never looked back. 


Describe your first major project. 

As an undergraduate, I developed a fluo- 
rescent probe for tissue that lets you control 
what is being lit and when. That work was 
published last year in Analytical Chemistry 
and I was first author. It reaffirmed my inter- 
est in pursuing a science career. 


What specific area are you working in? 
Mechanosensation — the conversion of 
mechanical stimuli into cellular responses. It 
has a role in the sense of touch and in physi- 
ological processes such as blood-flow sensing 
by vessels. Unlike for senses such as smell or 
vision, the molecular nature of mechano- 
sensation remains poorly understood. 


How did you become interested in the topic? 
I attended a lecture on mechanically gated 
ion channels by Roderick MacKinnon, and 
decided to do a rotation in his lab. I began 
monitoring how ion-channel activity 
responds to mechanical stimulation. 


What advice can you offer others looking to 
work for big names such as MacKinnon? 

You have to feel genuine interest and moti- 
vation for the question being researched. If 
you are in it only for the prestige, it will show 
during your interviews. 


What has been your most significant 
challenge so far? 

Coming to the United States. It’s been a huge 
adjustment. All my undergraduate biology 


courses were in Spanish, but here, the science 
is very intense, yet I have to speak and write 
in English. Plus it’s very cold and the light 
gets dim at 4 o'clock in the afternoon. This 
will be my second winter here, and I know 
I'm in for months and months of suffering. 


Why did you want to study in the United 
States? 

While I was an undergraduate in Argen- 
tina, I wasn’t sure I wanted to go abroad, 
but I met lecturers from Rockefeller. They 
were so free — the way they thought, what 
they were doing. Scientists in Argentina are 
limited by money and resources, and the 
scientific community is much smaller than 
in the United States. You can’t always study 
exactly what you want because there may not 
be a lab working on it. At Rockefeller, I work 
with top-notch scientists every day — I’m far 
closer to where science is actually happening. 


Has the HHMI award changed your opinion 
on the feasibility of a career in academia? 
Yes. Coming into a graduate career in the 
United States, I was aware that funding is 
quite hard to find for international people. 
But so far, both Rockefeller and HHMI have 
supported me, which gives me the idea that 
it’s not impossible to build up a career here 
regardless of my citizenship, even in times of 
financial crisis. 


Is there one issue that consistently crops up 
in your work? 

Managing stress. I watch my principal inves- 
tigator; although he’s under a lot of pressure, 
he enjoys the science he does. And that’s a 
good way to do it — be really motivated, but 
stay calm and try to have fun. m 
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UK RESEARCH 
Industry support falls 


UK industry must invest in research and 
create more scientific jobs to uphold 

the nation’s economic position, says an 
analysis. Global Research Report: United 
Kingdom, released on 19 October by 
Thomson Reuters, found that Britain is 

a world leader in key research indicators 
including highly cited papers. Almost 20% 
of articles with more than 1,000 citations 
come from the United Kingdom (that is, 
have at least one author in Britain), more 
than from any nation except the United 
States. Yet private investment in research 
has fallen since 1991. “Industry has failed 
to establish opportunities for talented 
researchers,” says report author Jonathan 
Adams, Thomson Reuters director of 
research evaluation, based in Leeds, UK. 
“We're going to find ourselves heading for 
second-rate economic status.” 


AUSTRALIA 
Academics unhappy 


Australian academic researchers are 
rallying behind a report that laments their 
working conditions. The government- 
funded study, out in September, surveyed 
5,525 academics across all career stages and 
fields at 20 universities. It found that nearly 
half of academics under 30 want to leave 
the country or the profession owing to low 
pay and lack of job security. Researchers 
are frustrated by teaching obligations that 
cut into research time; low grant success 
rates; and 70- to 80-hour working weeks. 
Emmaline Bexley, alecturer in higher 
education at the University of Melbourne 
and lead author of the study, says she hopes 
that her research will “help government 
and universities to work together to 
replenish the academic workforce” 


COLLABORATION 


Regional pact formed 


An agreement will let postdocs and early- 
career researchers from Singapore and 
Europe apply for training funds in each 
other's regions. Under the three-year pact, 
which was announced on 13 October 

and aims to stimulate collaboration, 
Singaporean scientists can seek European 
Molecular Biology Organization (EMBO) 
fellowships and grants and undertake 
EMBO training courses and activities. 
Fellowships will be available to European 
scientists wishing to work in Singapore. 
The pact is between EMBO, the European 
Molecular Biology Conference and the 
Singaporean government. 
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AN EASY SALE 


BY BIREN SHAH 


hen the perky birthsuiter girl 
came to greet him, Allen’s EPU 
— emotion processing chip — 


interrupted normal conscious processing 
and adjusted internal state variables to dis- 
appointment. 

“Hi, I'm Jeannette,’ the girl said with a big 
smile. 

“Are you sure we can't do this over the 
net?” Allen asked. 

“Dont be silly. You're already here: She 
turned on one high heel. 

Allen centred his visual focus below 
the edge of her black pinstriped skirt-suit, 
watching the smooth skin of her calves rip- 
ple as she walked into the conference room. 
He instructed follow, and a subroutine 
activated his leg actuators. After a few 
nanoseconds, his positional readings 
began to change. His visual cortex 
reallocated resources to handle the 
optic flow, shrinking the area he could 
see in focus. It’s just like them, he 
thought. They don’t know a 
damned thing about their 
customers. 

When he finally 
made it into the chair 
across from her, she 
said: “You've been a 
vocal critic of most of 
our releases since the 
116 series you're wear- 
ing now —” 

He began queuing up his 
response. “You keep introduc- 
ing upgrades for huge profits and 
keep gouging your customers...” 

As his voice modulator pro- 
cessed and spoke his words, Jean- 
nette flipped her lush black hair 
and rubbed the brown skin of her 
neck with the back of her hand. Her 
tendons pulled tight, framing the 
hollow of her neck in a little v. 

If they were jacked in, hed be the 
one waiting on her. And there cer- 
tainly wouldn't be any distractions 
like this. 

His EPU launched the memory 
of his last touch. He and his wife had 
been outside, sitting on a lawn. The sun 
warmed his shrivelled flesh. A weak 
breeze cooled his left arm from behind. 
She sat between his legs, leaning against 
him as his atrophied muscles strained to 


Soug 


Time for an upgrade. 


hold their weight. Hed insisted he get to be 
the one to hold her despite her protests that 
he was too weak from the chemo. It had 
been his last chance to feel like a true man. 
He leaned closer, just by having the desire, 
and her blond hair tickled his ear as it flut- 
tered against him. He took a deep breath to 
commit the smell of her — vanilla and jas- 
mine and a touch of sweat — to memory. 
His fingers stuck to her glistening pale skin 
as he caressed her arm, then slipped only to 
catch again. Goosebumps rose where he had 
touched. 

It was one of his last birthsuit memo- 
ries, stored perfectly in the artificial brain 
housing his mind. He could go back to that 
moment and re-experience every sensation. 

He never did. 

“Tm familiar with your objections,” 

Jeannette said. 
Allen realized he'd just trailed 
off. This beautiful girl had him 
swimming in those terrible 
currents of sensation again. He 
shut down the memory-playing 
process — but he had to wait 
for his emotion chip to 
flush his state buffer of 
the downward acceler- 
ometer reading meant 
to simulate a sink- 
ing feeling of loss. 
“Then you know I've 
added every feature 
you released since 
the 116s aftermarket 
for a tenth of the cost of 

an upgrade.” 

“Tm not here to argue that 
with you.” She leaned for- 
ward and Allen couldn't pull 

his visual focus from a flash 
of cobalt blue satin where her 
jacket parted. 

“Enjoying the view?” she asked. 

The salmon-coloured chromo- 
plasts in Allen’s cheeks flared. His 
central clock rate accelerated. 
“Dont worry — it’s kind of why 
we're here.” Jeannette stood up 
and removed her jacket. “Although 
you're right that most customers got 
only aesthetic value out of the 200s, 
300s and 400s, 
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product map.” She turned around. The blue 
blouse cut down to her waist, exposing the 
unblemished skin of her back. She pressed 
against her neck and a small flap appeared. 
She pulled on it gently, peeling the skin away 
to reveal a titanium alloy spine. 

“That's not a birthsuit.’ Allen struggled to 
get the words queued up between his EPU’s 
rapid-fire interrupts. 

“No, you're getting an advanced look at 
the 500 — all the technologies we’ve been 
developing for the past 150 years wrapped 
in a biological prosthetic.” She pushed her 
skin back in place and sat down. 

Allen focused on her forearm and 
instructed index-finger forward-back, for- 
ward-back. His finger reached out and poked 
her twice. “Does this feel ... right?” 

“The 500 is virtually indistinguishable 
from a birthsuit, from inside and out.” 

Allen focused on her hand and instructed 
palm surface dock (tolerance: 30cm of cur- 
rent position) followed by 5 cm forward-back, 
15 seconds. His hand rubbed the back of her 
hand. The sensory feedback came back like 
he was rubbing a pan with a spatula. 

“We want to offer you an upgrade, free of 
charge.” 

Manipulation-warning subroutines fired, 
but Allen dismissed them and shut down the 
monitoring process. What did she feel like? 
Smell like? 

“Your blog reaches nearly 10 million of 
our most savvy customers. As great as the 
500s are, after an initial exploring phase, our 
beta testers regularly end up weighing over 
300 pounds. Obviously, this causes some dis- 
satisfaction” 

Allen zoomed his visual field in on the 
shape-shifting of her mouth as she talked. 
He tried to remember what it was to have 
such a mouth — supple, responsive ... 

“We want you in our PR department, edu- 
cating our customers on the maintenance 
requirements of a biological prosthesis. 
We're thinking for say a period of 50 years?” 

A contract appeared in Allen's inbox. With 
a single thought, he signed the document 
with his public key and sent it back. “What 
was that about dissatisfaction?” 

“Never mind,’ she answered after a screen 
in the table flashed a message. “You won't 
have the same problems. I can tell.” m 


Biren Shah has recently moved to New York 
City, changed careers, and started writing 
again. He’s exploring as fast as he can. You 
can watch at www. birenshah.com. 
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Verbal and non-verbal intelligence changes in the 


teenage brain 


Sue Ramsden!, Fiona M. Richardson!, Goulven Josse!, Michael S. C. Thomas’, Caroline Ellis, Clare Shakeshaft', 


Mohamed L. Seghier! & Cathy J. Price! 


Intelligence quotient (IQ) is a standardized measure of human 
intellectual capacity that takes into account a wide range of 
cognitive skills’. IQ is generally considered to be stable across the 
lifespan, with scores at one time point used to predict educational 
achievement and employment prospects in later years’. Neuro- 
imaging allows us to test whether unexpected longitudinal fluctua- 
tions in measured IQ are related to brain development. Here we 
show that verbal and non-verbal IQ can rise or fall in the teenage 
years, with these changes in performance validated by their close 
correlation with changes in local brain structure. A combination of 
structural and functional imaging showed that verbal IQ changed 
with grey matter in a region that was activated by speech, whereas 
non-verbal IQ changed with grey matter in a region that was 
activated by finger movements. By using longitudinal assessments 
of the same individuals, we obviated the many sources of variation 
in brain structure that confound cross-sectional studies. This 
allowed us to dissociate neural markers for the two types of IQ 
and to show that general verbal and non-verbal abilities are closely 
linked to the sensorimotor skills involved in learning. More 
generally, our results emphasize the possibility that an individual’s 
intellectual capacity relative to their peers can decrease or increase 
in the teenage years. This would be encouraging to those whose 
intellectual potential may improve, and would be a warning that 
early achievers may not maintain their potential. 

An individual's abilities and capacity to learn can be partly captured 
by the use of verbal and non-verbal (henceforth performance) 
intelligence tests. IQ provides a standardized method for measuring 
intellectual abilities and is widely used within education, employment 
and clinical practice. In the absence of neurological insult or de- 
generative conditions, IQ is usually expected to be stable across life- 
span, as evidenced by the fact that IQ measurements made at different 
points in an individual’s life tend to correlate well'*. Nevertheless, 
strong correlations over time disguise considerable individual vari- 
ation; for example, a correlation coefficient of 0.7 (which is not unusual 
with verbal IQ) still leaves over 50% of the variation unexplained. The 
study that we report here tested whether variation in a teenager’s IQ 
over time correlated with changes in brain structure. If it did, this 
would provide construct validity for the increase or decrease of IQ 
in the teenage years, because if IQ changes correspond to structural 
brain changes then they are unlikely to represent measurement error in 
the IQ tests. In addition, if verbal and performance skills change at 
different rates in different individuals, the neural markers for verbal 
and performance IQ changes could in principle be dissociated. This 
would overcome two of the challenges faced by previous studies of 
between-subject variability in IQ measures at a given time point: verbal 
and performance IQ are tightly correlated in individuals, so it has 
been hard to identify neural structures corresponding to each**; and 
there are many sources of between-subject variance in brain structure 
(for example gender, age, size and handedness) that hide the relevant 
differences. 


Our participants were 33 healthy and neurologically normal 
adolescents with a deliberately wide and heterogeneous mix of abilities 
(see Supplementary Information for details and the implications of our 
sampling for the generalizability of our conclusions). They were first 
tested in 2004 (‘time 1”) when they were 12-16 yr old (mean, 14.1 yr). 
Testing was repeated in 2007/2008 (‘time 2’) when the same indivi- 
duals were 15-20 yr old (mean, 17.7 yr). See Table 1 for further details 
of the participants. During the intervening years, there were no testing 
sessions, and participants and their parents had no knowledge that 
they would be invited back for further testing. On both test occasions, 
each participant had a structural brain scan using magnetic resonance 
imaging (MRI) and had their IQ measured using the Wechsler 
Intelligence Scale for Children (WISC-III) at time 1 and the Wechsler 
Adult Intelligence Scale (WAIS-III) at time 2 (see Supplementary 
Information for details). These two widely used, age-appropriate assess- 
ments” produce strongly correlated results at a given time point, con- 
sistent with them measuring highly similar constructs®. Scores on 
individual subtests are standardized against age-specific norms and 
then grouped to produce separate measures of verbal IQ (VIQ) and 
performance IQ (PIQ), with VIQ encompassing those tests most related 
to verbal skills and PIQ being more independent of verbal skills. 
Nevertheless, VIQ and PIQ scores are very significantly correlated with 
each other across participants: in our sample, the correlations between 
VIQ and PIQ were r= 0.51 at time 1 and r= 0.55 at time 2 (in both 
cases, n = 33; P<0.01). Full-scale IQ (FSIQ) is the composite of VIQ 
and PIQ and is regarded as the best measure of general intellectual 
capacity (the g factor) that has previously been shown to correlate with 
brain size and cortical thickness in a wide variety of frontal, parietal and 
temporal brain regions”*. 

The wide range of abilities in our sample was confirmed as follows: 
FSIQ ranged from 77 to 135 at time 1 and from 87 to 143 at time 2, with 
averages of 112 and 113 at times 1 and 2, respectively, and a tight 
correlation across testing points (r= 0.79; P< 0.001). Our interest 
was in the considerable variation observed between testing points at 
the individual level, which ranged from —20 to +23 for VIQ, —18 to 
+17 for PIQ and —18 to +21 for FSIQ. Even if the extreme values of 
the published 90% confidence intervals are used on both occasions, 
39% of the sample showed a clear change in VIQ, 21% in PIQ and 33% 
in FSIQ. In terms of the overall distribution, 21% of our sample showed 


Table 1 | Participants’ details 


Datum Age FSIQ VIQ PIQ 
Time 1 Mean (s.d.) 14.1(1.0) 112(13.9) 113(15.1) 108 (12.3) 
Min, max 12.6,16.5 77,135 84, 139 74, 137 
Time 2 Mean (s.d) 17.7(1.0) 113(14.0) 116(18.0) 107 (9.6) 
Min, max 15.9,20.2 87,143 90,150 83,124 
Correlation* r — 0.792+ 0.809} 0.589+ 
Change Mean (s.d.) 3.5(0.2) +1.0(9.0) +3.0 (10.6)—1.0 (10.2) 


(time 2—time1) Min,max 3.3,3.9 


* Correlation coefficient between scores at times 1 and 2. +P<0.01. 
n= 33 (19 male, 14 female). s.d., standard deviation. 


—18,+21 —20,+23 —18,+17 
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LETTER 


a shift of at least one population standard deviation (15) in the VIQ 
measure, and 18% in the PIQ measure. However, only one participant 
had a shift of this magnitude in both measures, and, for that particip- 
ant, one measure showed an increase and the other a decrease. This 
pattern is reflected in the absence of a significant correlation between 
the change in VIQ and the change in PIQ. The independence of 
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Figure 1 | Location of brain regions where grey matter changed with VIQ 
and PIQ. a, Correlation between change in grey matter density and change in 
VIQ (yellow) in the left motor speech region (peak in the left precentral gyrus at 
x 47 mm, y = —9mm, z= +30 mm, measured in Montreal Neurological 
Institute (MNI) space, with a Z-score of 5.2 and 681 voxels at P< 0.001). The 
corresponding effect on volume was slightly less significant (Z-score, 3.5; 118 
voxels at P< 0.001). b, Correlation between change in PIQ (red) and change in 
grey matter density in the anterior cerebellum (peak at x = +6 mm, 

y = —46 mm, z = +3 mm, in MNI space, with a Z-score of 3.9 and 210 voxels at 
P<0.001). Both effects were significant at P< 0.05 after familywise error 
correction for multiple comparisons in extent based on the number of voxels in 
a cluster that survived P< 0.001 uncorrected. In addition, the VIQ effect was 
significant at P< 0.05 after familywise error correction for multiple 
comparisons in height. The statistical threshold used in the figure (P < 0.001) 
illustrates the extent of the effects. Plots show the change in grey matter density 
versus the change in both VIQ and PIQ at the voxel with the highest Z-score in 
the appropriate region. Linear regression lines are shown for significant 
correlations. Changes in the motor speech region correlated with changes in 
VIQ but not changes in PIQ, whereas changes in the anterior cerebellum 
correlated with changes in PIQ but not changes in VIQ (P< 0.001). n = 33; 
GMD, grey matter density. 
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changes in these two measures allows us to investigate the effect of 
each without confounding influences from the other. 

To test whether the observed IQ changes were meaningfully 
reflected in brain structure, we correlated them with changes in local 
brain structure. This within-subject correlation obviates the many 
possible sources of between-subject variance and may have sensitized 
our analysis to neural markers of VIQ and PIQ that have not previ- 
ously been revealed. Given the distributed nature of brain regions 
associated with between-subject differences in FSIQ”’, regions of 
interest were not used in this analysis, and the results of the whole- 
brain analysis were only considered to be significant at P< 0.05 after 
familywise error correction in either height (peak signal at a single 
voxel) or extent (number of voxels that were significant at P< 0.001). 

Using regression analysis, we studied the brain changes associated 
with a change in VIQ, PIQ or FSIQ (see Methods Summary for details). 
The results (Fig. 1) showed that changes in VIQ were positively corre- 
lated with changes in grey matter density (and volume) in a region of 
the left motor cortex that is activated by the articulation of speech’. 
Conversely, changes in PIQ were positively correlated with grey matter 
density in the anterior cerebellum (lobule IV), which is associated with 
motor movements of the hand'*”’. Post hoc tests that correlated struc- 
tural change with change in each of the nine VIQ and PIQ subtest 
scores that were common in the WISC and WAIS assessments found 
that the neural marker for VIQ indexed constructs that were shared by 
all VIQ measures and that the neural marker for PIQ indexed con- 
structs that were common to three of the four PIQ measures (Table 2). 
This indicates that our VIQ and PIQ markers indexed skills that were 
not specific to individual subtests. There were no other grey or white 
matter effects that reached significance in a whole-brain structural 
analysis of VIQ, PIQ or FSIQ. See Supplementary Information for 
details of further post hoc tests. 

Our findings that VIQ changes were related to a motor speech 
region and PIQ changes were related to a motor hand region are 
consistent with previous claims that cognitive intelligence is partly 
dependent on sensorimotor skills'?'*. Using functional imaging in 
the same 33 participants performing a range of sensory, motor and 
language tasks, we confirmed that the left motor speech region iden- 
tified in the VIQ structural analysis was more activated by articulation 
tasks (naming, reading and saying “one, two, three”) than by semantic 
or perceptual tasks that required a finger press response (see 
Supplementary Information for details). In contrast, a region very 
close to the anterior cerebellum region identified in the PIQ structural 
analysis was more activated during tasks involving finger presses than 
during tasks involving articulation. Figure 2 shows these results at both 
the group level and the individual level. The locations of the grey 


Table 2 | Correlations between grey matter density and score 


Test type Motor speech Anterior 
region (r) cerebellum (1) 
Verbal tests Vocabulary 0.284* 0.142 
Similarities 0.438+ —0.021 
Arithmetic 0.477+ 0.304% 
Information 0.314t 0.185 
Comprehension 0.541+ 0.183 
Non-verbal tests Picture completion 0.038 0.363t 
Digit symbol coding 0.003 —0.028 
Block design 0.000 0.306t 
Picture arrangement 0.126 0.437+ 


* Trend (one-tailed) at P= 0.0545. +Significant (one-tailed) at P< 0.01. {Significant (one-tailed) at 
P<0.05. 

Correlations were calculated using changes in scaled (that is, age-adjusted) scores in the various 
subtests that were common to both the WISC and the WAIS. The change in grey matter density in the 
motor speech region correlated significantly with changes in scores in four of the five verbal subtests, 
and there was a near-significant trend in the fifth but it did not correlate significantly with changes in 
scores in any of the four tests that comprise PIQ. Conversely, the change in grey matter density in the 
anterior cerebellum correlated significantly with changes in scores in three of the four tests that 
comprise PIQ (the exception being the digit symbol coding test, which has a particular loading on 
processing speed) but correlated with changes in scores in only one of the verbal tests (the arithmetic 
test, which probably has the smallest verbal component of the verbal tasks). 
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Figure 2 | Functional activations in the regions identified by the structural 
analysis. The motor speech region was more activated by articulation tasks 
than by finger press tasks (x 48mm, y 10mm, z= +30mm (MNI); 
t = 14.7; P< 0.05 familywise-error-corrected for multiple comparisons across 
the whole brain), and corresponds to the region identified in the structural 
analysis for VIQ. These effects were consistently observed at the same 
coordinates for all individual subjects. In the three exemplar participants shown 
here (P1, P2, P3), the Z-scores were 3.9, 3.5 and 3.0, respectively. The anterior 
cerebellum region was more activated during finger presses than during 
articulation at both the group level (peak at x = +6 mm, y = —48 mm, 

z= —4mm (MNI); Z-score, 3.7; 216 voxels at P< 0.001 corrected for multiple 
comparisons in extent) and the individual level (P1: x = +12 mm, 

y = —48 mm, z= +2 mm (MNI); Z-score, 3.7; P2: x = +6mm, y = —50mm, 
z= —6mm (MNI); Z-score, 3.3; P3: x = +12mm, y 46mm, z= +2mm 
(MNI); Z-score, 4.9). In all cases, the activation peaks were identified from 
whole-brain analyses and the peak effects for the correlation with structure are 
illustrated with blue cross hairs in both the structural results and the functional 
results. This illustrates that the location of the structural effects is within the 
regions identified by the functional effects. 


matter changes associated with VIQ and PIQ changes do not corre- 
spond to the anterior frontal and parietal regions associated with 
general intelligence’ (g factor). It may therefore be the case that g 
remains relatively constant across ages, but changes in the ability to 
perform individual subtests depend on changes in sensorimotor skills. 
It is also notable that although completion of the subtests comprising 
verbal and performance measures must implicate a network of brain 
regions, only structural changes in regions associated with sensorimo- 
tor skills showed correlations with changes in VIQ and PIQ. 

The changes in brain structure that correlated with changes in IQ 
allow us to explain some of the variance in terms of brain development. 
Specifically, 66% of the variance in VIQ at time 2 was accounted for by 
VIQ at time 1, a further 20% was accounted for by the change in grey 
matter density in the left motor speech region, with the remaining 14% 
unaccounted for. Similarly, 35% of the variance in PIQ at time 2 was 
accounted for by PIQ at time 1, with 13% accounted for by the change 
in grey matter density in the anterior cerebellum, leaving 52% un- 
accounted for. Future studies may be able to account for more of the 
between-subject variability by using a similar methodology with larger 
samples or other methodologies that measure structural or functional 
connectivity*””. 
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Our findings demonstrate considerable effects of brain plasticity in 
our sample during the teenage years, over and above normal develop- 
ment. By obviating the many sources of between-subject variance and 
controlling for global changes in brain structure, our within-subject 
analysis has allowed us to dissociate brain regions where structure reflects 
individual differences in verbal or non-verbal performance, in a way that 
has proved difficult in previous studies using behavioural data from a 
single point in time. We have also shown that the changes observed over 
time in the IQ scores of teenagers cannot simply be measurement error, 
because they correlate with independently measured changes in brain 
structure in regions that are plausibly related to the verbal and non- 
verbal functions tested. Further studies are required to determine the 
generalizability of this finding; for example, the same degree of plasticity 
may be present throughout life or the adolescent years covered by this 
study may be special in this regard. In addition, future work could 
consider the causes of the identified changes both in intelligence and 
in brain structure and how they impact on educational performance and 
employment prospects. The implication of our present findings is that an 
individual’s strengths and weaknesses in skills relevant to education and 
employment are still emerging or changing in the teenage years. 


METHODS SUMMARY 


This study was approved by the Joint Ethics Committee of the Institute of 
Neurology and the National Hospital for Neurology and Neurosurgery, 
London, UK. All structural and functional scans at times 1 and 2 were acquired 
from the same Siemens 1.5T Sonata MRI scanner (Siemens Medical Systems). The 
structural images were acquired using a T1-weighted modified driven equilibrium 
Fourier transform sequence with 176 sagittal partitions and an image matrix of 
256 X 224, yielding a final resolution of 1 mm? (repetition time, 12.24 ms; echo 
time, 3.56 ms; inversion time, 530 ms). To pre-process the 66 structural images (33 
participants X 2 time points), we used SPMB8 (http://www.fil.ion.ucl.ac.uk/spm) 
with the DARTEL toolbox to segment and spatially normalize the brains into the 
same template, with and without modulation. Modulated images incorporate a 
measure of local brain volume, whereas unmodulated images, used with propor- 
tional scaling to correct for global grey matter, provide a measure of regional grey 
matter density. Previous studies*”** have shown that the correlations between 
brain structure and cognitive ability are better detected by grey matter density. 
Coordinates for each voxel were converted to standard MNI space. Images were 
smoothed using a Gaussian kernel with an isotropic full-width of 8 mm at half- 
maximum. The relationship between change in IQ and change in brain structure 
was investigated by entering the appropriate pre-processed images (modulated or 
unmodulated grey or white matter) into within-subject paired t-tests, with change 
in IQ (VIQ, PIQ or FSIQ) and year of scan as covariates. The degree to which IQ at 
time 2 was predicted by changes in brain structure was investigated in a hierarch- 
ical regression analysis with IQ at time 1 entered before change in brain structure. 
Details of the functional imaging method have been reported elsewhere**** and 
are summarized in Supplementary Information. 
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Killer cell immunoglobulin-like receptor 3DL1- 
mediated recognition of human leukocyte antigen B 
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Philippa M. Saunders’, Maya A. Olshina’, J acqueline M. L. Widjaja’, Christopher M. Harpur®, Jie Lin*, Sebastien M. Maloveste*, 
David A. Price*®, Bernard A. P. Lafont*, Daniel W. McVicar?, Craig S. Clements', Andrew G. Brooks*® & Jamie Rossjohn!? 


Members of the killer cell immunoglobulin-like receptor (KIR) 
family, a large group of polymorphic receptors expressed on 
natural killer (NK) cells, recognize particular peptide-laden human 
leukocyte antigen (pHLA) class I molecules and have a pivotal role 
in innate immune responses’. Allelic variation and extensive 
polymorphism within the three-domain KIR family (KIR3D, 
domains D0O-D1-D2) affects pHLA binding specificity and is 
linked to the control of viral replication and the treatment outcome 
of certain haematological malignancies’ *. Here we describe the 
structure of a human KIR3DLI receptor bound to HLA-B*5701 
complexed with a self-peptide. KIR3DL1 clamped around the 
carboxy-terminal end of the HLA-B*5701 antigen-binding cleft, 
resulting in two discontinuous footprints on the pHLA. First, the 
DO domain, a distinguishing feature of the KIR3D family, 
extended towards B2-microglobulin and abutted a region of the 
HLA molecule with limited polymorphism, thereby acting as an 
‘innate HLA sensor’ domain. Second, whereas the D2—-HLA- 
B*5701 interface exhibited a high degree of complementarity, 
the D1-pHLA-B*5701 contacts were suboptimal and accommo- 
dated a degree of sequence variation both within the peptide and 
the polymorphic region of the HLA molecule. Although the two- 
domain KIR (KIR2D) and KIR3DL1 docked similarly onto HLA- 
C** and HLA-B respectively, the corresponding D1-mediated 
interactions differed markedly, thereby providing insight into 
the specificity of KIR3DL1 for discrete HLA-A and HLA-B 
allotypes. Collectively, in association with extensive mutagenesis 
studies at the KIR3DL1-pHLA-B*5701 interface, we provide a 
framework for understanding the intricate interplay between 
peptide variability, KIR3D and HLA polymorphism in determin- 
ing the specificity requirements of this essential innate interaction 
that is conserved across primate species. 

HLA-B57 carriage has been associated with delayed progression to 
AIDS in HIV-infected individuals, with a strong genetic association 
between the KIR3DL1-—HLA-B57 interaction, reduced viral loads and 
delayed HIV disease progression’. We expressed KIR3DL1*001, a 
prototypical family member, and co-complexed it with HLA-B*5701 
bound to a self-peptide (LSSPVTKSF). The affinity (Kp) of this inter- 
action was approximately 17 1M (Supplementary Table 1 and Sup- 
plementary Fig. 1). We then determined the KIR3DL1*001-HLA- 
B*5701-LSSPVTKSE structure to 1.8A resolution (Supplementary 
Table 2 and Supplementary Fig. 2). KIR3DL1*001 clamped around 
the C-terminal end of the HLA-B*5701 antigen-binding cleft (Fig. 1a, 
b), forming an extensive interface (total buried surface area (BSA), 
1,740 A?) that encompassed two discontinuous sites—one mediated 
via the DO domain and the other via the D1-D2 domains (Figs 1c, d 
and 2a-d). KIR3DL1*001 adopted an elongated, zigzag conformation, 
with the three immunoglobulin (Ig) domains, termed DO, D1 and D2 


(residues 7-98, 99-198 and 203-292, respectively) defined by the 
E-type Ig fold topology (Fig. 1a). The DO domain, a feature of the 
KIR3D family® packed against the D1 domain, the relative juxtaposi- 
tioning of which (83°) is similar to that of the D1-D2 inter-domain 
angle (81°), which in turn is analogous to the relative orientation of 
D1-D2 domains (76°) found in the KIR2D receptors (root mean 
squared deviation (r.m.s.d.) of D1-D2 domains in KIR2DL1 and 


Figure 1 | Structure of the KIR3DL1*001-pHLA-B*5701 complex. 

a, b, Orthogonal views of the complex with the KIR3DL1*001 B-strands 
labelled. The HLA and B,-microglobulin ($2m) are coloured green and cyan, 
respectively; D0, D1, D1-D2 loop and D2 are coloured yellow, blue, pink and 
orange, respectively; dashed line represents the unresolved loop between the E 
and F f-strands. c, d, The footprint mapped to the surface of HLA and 
KIR3DL1*001, respectively, with residues coloured in each case according to 
the interacting KIR3DL1*001 domain: D0 (yellow), D1 (blue), D1-D2 loop 
(pink) and D2 (orange). Residues that contact the linker and the D2 domain are 
coloured brown. 
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Figure 2 | Contacts between the KIR3DL1*001 receptor and pHLA- 
B*5701. Panels coloured as in Fig. 1. Waters are shown as cyan spheres; 
hydrogen bonds as black lines; van der Waals contacts as red lines. a, Contacts 
between the DO domain and pHLA. b, Contacts between the receptor and the 
a2 helix of pHLA. c, Contacts to the peptide. Residues from the D1 and D2 
domains form a single van der Waals and three water-mediated contacts to the 
P8 and P9 peptide positions. d, Contacts between the receptor and the «1 helix. 
The interface between the D1 domain and the «1 helix was suboptimal, 
comprising a single direct hydrogen bond from Gly 138 to Arg 79. 


KIR3DL1 is 1.2 A) (Supplementary Fig. 3a)*°. Further, the positioning 
of the DO domain relative to the D1 and D2 domains appears to be 
fixed (Supplementary Fig. 3b, c), thereby generating a pre-formed 
pHLA-binding receptor. 

The DO domain contributed 30% BSA upon complexation with 
ligand, being orientated almost perpendicular to the main axis of the 
antigen-binding cleft, extending towards, and just contacting, B2- 
microglobulin (Fig. la). A surface-exposed aromatic cluster (Phe 9, 
Trp 13, His 29, Phe 34) on one face of the DO domain ligated to loops 
comprising residues 14-18 and 88-92 of HLA-B*5701 (Fig. 2a 
and Supplementary Table 3), both of which flexed slightly upon 
KIR3DL1*001 binding (Supplementary Fig. 4)’. These two HLA loops 
exhibit very limited polymorphism among the HLA-A and HLA-B 
allotypes and mostly have main-chain interactions with the DO 
domain, thereby indicating that the DO-HLA interactions are largely 
independent of sequence variation and likely to be conserved across 
most HLA allotypes. Lengthening or shortening the HLA-B*5701 loop 
(residues 14-18) markedly reduced binding to KIR3DL1*001 (Fig. 3a). 
Alanine substitution of Ser11, His29 and particularly Phe9 in 
KIR3DL1*001 impaired binding of HLA-B*5701 tetramers, further 
highlighting the importance of the DO contacts (Fig. 3b). Interestingly, 
the site of the DO-mediated interaction on HLA-B*5701 has not, to the 
best of our knowledge, been observed in any HLA-binding immune 
receptor/co-receptor to date, indicating a unique molecular recog- 
nition signature, in which the DO domain acts as an ‘innate sensor’ 
of an essentially invariant region of the HLA molecule. 

The D1-D2 domains converged to form a continuous binding inter- 
face with HLA-B*5701 (Fig. 1c, d), interacting with residues from the 
o1- and «2-helices flanking the P8 position of the peptide. The ligand- 
binding site of the D1-D2 domains was relatively flat, facilitating the 
close positioning of HLA-B*5701, resulting in an intricate network of 
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interactions across the interface; as such, the total BSA upon com- 
plexation at the D1—-D2 interaction site was quite large (total BSA, 
1,360 A”), The D1 and D2 domains contributed 600 and 760 A? total 
BSA to the interface respectively, with the D1 domain docked above 
the «1-helix and contacting the peptide, whereas the D2 domain sat 
above the «2-helix, thereby providing immediate insight into the dis- 
parate roles that the D1 and D2 domains have in HLA-B*5701 engage- 
ment (Fig. 1c, d). The D2 domain predominantly interacted with a 
region spanning residues 142-151 of HLA-B*5701 (Supplementary 
Table 3), a region that shows limited polymorphism among HLA-B 
allotypes. At the core of the D2-HLA-B*5701 binding interface, two 
aromatic residues of KIR3DL1*001, Tyr 200 and Phe 276, converged 
onto the «2-helix, whereas polar interactions were located at the peri- 
phery (Fig. 2b). A feature of this interface was the centrally located 
Glu 282 of KIR3DL1*001, a charged residue that abuts Leu 166 from 
the D1 domain, yet is stabilized by polar interactions with Tyr 200 and 
Ser279 of KIR3DL1*001, Lys146 of HLA-B*5701 and water- 
mediated interactions with the peptide and Arg 83 (not shown) on 
the o1l-helix (Fig. 2c). Alanine substitution of Glu201, Ser 227, 
Asp 230 or His 278, residues that were located at the exterior of the 
interface, had little effect on binding (Fig. 3b). In contrast, alanine 
substitution of Tyr 200 or Phe 276, which formed the central aromatic 
cluster, or the charged residue Glu 282, abrogated tetramer binding 
(Fig. 3b). Further, of the five HLA-B*5701 mutations made at the D2- 
HLA-B*5701 interface, three residues (Ile 142, Lys 146 and Ala 149) 
markedly affected the affinity of the interaction (Fig. 3a). These three 
HLA-B*5701 residues interacted principally with Tyr200 and 
Phe 276, further highlighting the importance of this internal core of 
KIR3DL1*001 residues in driving the D2-HLA-B*5701 interaction. 
Collectively, the D2-HLA-B*5701-binding site seems to have co- 
evolved to form a highly complementary binding interface. 
KIR3DL1 recognizes HLA class I allotypes that contain the Bw4 
serological epitope spanning residues 77-83 on the o1-helix®”. 
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Figure 3 | Mutational analysis at the KIR3DL1*001-pHLA-B*5701 
interface. a, Surface plasmon resonance (SPR)-based measurements of the 
KIR3DL1*001-HLA-B*5701 mutants interaction. Results are expressed as 
percentage of the wild-type interaction; mutants are colour-coded according to 
the KIR3DL1*001 domain they contact to correspond with Fig. 1. b, Capacity 
of HLA-B*5701 tetramers to bind 293T cells expressing wild-type or mutant 
KIR3DL1*001. HLA-B*5701 tetramers, but not HLA-B*0801 tetramers (data 
not shown), bound 293T cells transfected with KIR3DL1*001. Binding is 
expressed as a proportion of positive cells relative to cells transfected with wild- 
type KIR3DL1*001. Mutated residues are colour-coded as in Fig. 1. N = 2 
independent experiments; error bars represent s.e.m. 
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While the D1 domain was positioned over the Bw4 epitope making 
contacts with residues 79, 80 and 83, it interacted with a broader region 
of the «1-helix, including Gln 72, which bound to Met 165 (Fig. 2d and 
Supplementary Table 3). In marked contrast to the D2-mediated con- 
tacts, the DI-HLA-B*5701 interface appeared to largely lack both 
charge and shape complementarity (Fig. 2d and Supplementary Fig. 
5). Among the residues within the Bw4 motif, Arg 79 formed van der 
Waals contacts with Ser 140 and hydrogen-bonded to the main chain 
of Gly 138. Nevertheless the environment of Arg 79 was suboptimal, 
with its side chain being in close proximity to Lys 136 and Ile 139 of 
KIR3DL1*001. Ile 80, a residue previously associated with KIR3DL1 
reactivity’’, formed a single van der Waals contact with Leu 166 and 
was positioned within a small hydrophobic cavity created by Glu 76, 
Arg79 and Arg 83, a triad of HLA-B*5701 residues that leaned 
towards each other to form an array of salt-bridging interactions 
(Fig. 2d). Further, Arg 83 from the Bw4 motif packed against and 
hydrogen-bonded to the main chain of His278 (Fig. 2d). 
Surprisingly, none of the five alanine mutations introduced into the 
D1 domain had a substantial effect on the KIR3DL1-pHLA-B*5701 
interaction (Fig. 3b). However, in contrast, mutation of the corres- 
ponding HLA-B*5701 contact residues did affect recognition, particu- 
larly the Ile80Ala and Arg83Ala mutations (Fig. 3a). Interestingly, 
mutation of Ile 80 to Thr, a natural dimorphism within the Bw4 motif, 
resulted in a modest reduction in the affinity of the interaction with 
KIR3DL1*001 (Fig. 3a). Presumably, the Ile80Ala and Ile80Thr muta- 
tions differentially disrupt the conformation of the Glu 76-Arg 79- 
Arg 83 triad, thereby affecting KIR3DL1 recognition. Thus, whereas 
KIR3DL1*001 contacted the highly polymorphic region of the HLA 
class I in a non-optimal manner, and the D1 residues were shown to be 
non-essential for this interaction, modifications within the HLA itself 
affected the D1-HLA-B*5701 interaction and thus could serve to fine- 
tune the specificity of the interaction. Indeed, although KIR3DL1*001 
specifically binds HLA molecules that possess the Bw4 motif, it does 
not interact with the closely related Bw6 motif, which possesses a Gly at 
position 83. Accordingly our data provide a basis for understanding 
the importance of polymorphism at residue 83 for KIR3DL1 recog- 
nition of the Bw4* epitope’. 

The D1 domain interacted with the LSSPVTKSF peptide; however, 
the sole direct interaction between the peptide and KIR3DL1*001 was 
a van der Waals contact between P8-Ser (where P8 is position 8 of the 
peptide) and Leu 166 (Fig. 2c). Thus, KIR3DL1*001 made limited 
contact with the peptide, analogous to the interactions observed 
between KIR2D and peptides bound to HLA-C*”, and in marked 
contrast to CD94-NKG2A recognition of HLA-E"'. To probe the role 
of peptide in the interaction, a series of peptides that were substituted 
at P8 were refolded with HLA-B*5701 and assessed for their impact on 
recognition by KIR3DL1*001. The Phe, His and Arg P8 substitutions 
all facilitated an interaction with KIR3DL1*001, albeit with lower 
affinities, suggesting that the receptor interface has some capacity to 
tolerate large side chains at P8, consistent with the presence of a 
solvent-filled cavity adjacent to the P8 position at the KIR3DL1*001- 
pHLA-B*5701 interface. In contrast, the Ala, Glu and Leu P8 substitu- 
tions markedly reduced the corresponding interaction affinities 
(Supplementary Table 1 and Supplementary Fig. 1), suggesting that 
the KIR3DL1*001 receptor can ‘discriminate’ between peptides. The 
basis for the differential effects of the P8 residue could be attributable 
either to direct steric hindrance/lack of complementarity between the 
peptide and KIR3DL1*001, or to conformational alteration of the 
residues within the Bw4 motif itself”. Collectively, our observations 
are consistent with previous studies’*'*, which demonstrated that the 
sequence of the bound peptide could have a profound effect on HLA 
recognition by KIR. 

Next, we assessed the underlying HLA specificities of the KIR2DL 
and KIR3DL receptor families*’ (Supplementary Figs 6 and 7). The 
D1-D2 domains of KIR3DL1*001 share clear sequence and structural 
homology with the HLA-C-reactive receptors, KIR2DLI1, -2 and -3 
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(refs 4, 5, 15), and there are a number of similarities in the recognition 
of the «2-helix by both KIR2DL1 and -2 and KIR3DL1*001 (Sup- 
plementary Fig. 7). In contrast, the interactions between the 
KIR3DL1*001 and KIR2DL1 receptors and the «1-helices of their 
respective HLA class I ligands vary (Supplementary Fig. 6b). These 
differences principally arise from the loop regions that connect the C 
and C’ B-strands and the E and F $-strands in the D1 domain and the F 
and F’ B-strands that bridge the D1 and D2 domains. For example, the 
CC’ loop in the KIR2DL receptors adopts a notably different conforma- 
tion from that observed in KIR3DL1*001 (Supplementary Fig. 6c). In 
KIR3DL1*001, this loop (137-140) is mostly flat and featureless, sitting 
adjacent to the «1-helical axis, forming limited contacts with HLA- 
B*5701. In the KIR2DL receptors, the corresponding loop region 
(42-45) is orientated towards the «1-helix, and contains two prominent 
residues that would prevent binding to HLA-B*5701 owing to steric 
hindrance with residues within the Bw4 motif. Thus, the D1-mediated 
contacts are critical for the HLA specificity differences between the 
KIR2DLI family and KIR3DL1*001. 

The KIR3D family comprises the KIR3DL1/S1, KIR3DL2 and 
KIR3DL3 proteins’®. More than 200 alleles within the KIR3D family 
have been described, with KIR3D allomorphs generally differing from 
each other by a limited number of amino acids’’. Given the high 
sequence identity between KIR3DL1*001 and KIR3DL2, KIR3DL3 
and KIR3DS1 receptors (86, 74 and 97%, respectively), the 
KIR3DL1*001-HLA-B*5701 structure provided a template to exam- 
ine the impact of sequence variation across the entire KIR3D family 
and relate this to pHLA specificity. Sequence and structural analyses 
suggested that a ‘hotspot’ resided within the D1-D2 domains, com- 
prising loops 165-167, 199-201 and 278-282, all of which converged 
to form an intricate bonded network that centred on Glu 282 (Fig. 4a-e). 
Variation within these three loops could potentially alter the conforma- 
tion of neighbouring residues within this hotspot region, thereby affect- 
ing receptor specificity. 

KIR3DS1 is distinct among the KIR3D family in that it is an 
activating receptor. Genetic data have shown that KIR3DLI1 and 
KIR3DS1 are allelic variants of the same gene and suggested that 
KIR3DS1 interacts with HLA-Bw4 molecules bearing an Ile at residue 
80 (Bw4+180)'*. However, direct evidence of an interaction between 
KIR3DS1 and Bw4+180 molecules is lacking’’. Four positions that 
differ between KIR3DL1 and KIR3DS1 map to the KIR3DL1*001- 
pHLA-B*5701 interface and thus may affect the interaction (Fig. 4c), 
consistent with recent observations using HLA-A24 tetramers**”’. 
Whereas the Gly138Trp and Prol99Leu mutations had little impact 
on HLA-B*5701 binding, mutation of Leu 166—which is located 
within the hotspot—to Arg substantially diminished tetramer binding 
(Fig. 4f), thereby providing a basis for why KIR3DS1 cannot bind 
HLA-B*5701. 

The KIR3DL2 family recognizes a limited subset of HLA-A allo- 
types***’, with seven sequence differences that map to the hotspot region 
(Fig. 4d). The introduction of these residues into KIR3DL1*001 showed 
that whereas the Leul166Pro, Alal67Val (Fig. 4f) and His278Ala muta- 
tions (Fig. 3b) did not impair recognition of HLA-B*5701, the 
Ser279Leu and Glu282Val mutations markedly reduced tetramer bind- 
ing (Fig. 4f). Removal of the charged moiety of Glu 282 would disrupt the 
intricate network of interactions at the KIR3DL1*001-pHLA-B*5701 
interface. Whereas the Ser279Ala mutation did not abrogate HLA- 
B*5701 binding (Fig. 3b), the impact of the Ser279Leu mutation was 
much more pronounced. This effect appears attributable to the more 
bulky Leu residue causing a steric clash with Arg 83, thereby suggesting a 
basis for the lack of reactivity of KIR3DL2 towards the Bw4 motif. 
Moreover, unlike HLA-B*5701 and other HLA-Bwé4 allotypes, HLA- 
A3 and HLA-A11 possess a Gly at position 83 rather than Arg, which is a 
crucial determinant for KIR3DL1 recognition of the Bw4 motif’. 

The specificity of the KIR3DL3 receptor family is undefined, and a 
number of differences between KIR3DL1 and -3DL3 reside within the 
hotspot region (Fig. 4e). Binding experiments showed that the 
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Figure 4 | Mapping of polymorphisms and sequence variations onto the 
structure of KIR3DL1*001. a, The ‘hotspot’ comprises three loops. 

b, Polymorphisms within the KIR3DL1 family. c-e, Differences between 
KIR3DS1*013 and KIR3DL1*001 (c), KIR3DL2*001 and KIR3DL1*001 

(d), KIR3DL3*001 and KIR3DL1*001 (e). Polymorphisms are represented as 
spheres: red, direct contacts; cyan, water-mediated contacts; green, residues 


Met165Thr or Leul66Pro substitutions in KIR3DL1*001 did not affect 
HLA-B*5701 binding (Fig. 4f). Further, whereas the Prol199Leu sub- 
stitution had a modest impact on recognition, the Glu282Ala sub- 
stitution within KIR3DL1*001 totally abrogated tetramer binding 
(Fig. 3b), thereby indicating that residues 279 and 282 are critical 
determinants of the specificity differences between KIR3DL1 and 
other KIR3D family members. 

Surprisingly, the extensive polymorphism among the inhibitory 
receptors within each KIR3D family was predominantly located at 
sites not directly implicated in pHLA binding (Fig. 4b). Collectively, 
these observations indicate that the majority of KIR3D polymorphisms 
within a family*** are unlikely to directly affect the affinity of the 
pHLA interaction per se, but rather are likely to affect pHLA binding 
via altering expression levels and/or the clustering of the KIR3D recep- 
tors on the cell surface, whereas sequence differences across the KIR3D 
family directly affect pHLA affinity and specificity. Indeed, functional 
studies have shown that polymorphisms in residues such as 238 that 
are distant from the receptor/ligand interface can affect target cell 
recognition by KIR3DL1* NK cells”*. 

Collectively, our data provide a fundamental basis for understand- 
ing how a representative KIR3DL family member interacts with an 
HLA-B molecule that possesses the Bw4 motif. We show that the DO 
domain, a feature of this family, interacts with a previously unrecog- 
nized determinant on the HLA molecule, which is highly conserved 
across HLA-A and HLA-B allotypes in particular. These observations 
indicate that the DO domain acts as an innate HLA sensor at a site 
that that is not involved in either peptide or TCR binding”. The 
KIR3DL interaction sites seem to be largely conserved across the 
KIR3D family, with specificity differences mapping to a hotspot 
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that may affect binding; other, remaining residues coloured according to 
domain. f, The capacity of HLA-B*5701 tetramers to bind to 293T cells 
transfected with plasmids encoding either a Flag-tagged KIR3DL1*001 or 10 
site-directed mutants representing sites of 3DL1/2/3/3DS1 variation that 
contacted HLA-B*5701. N = 2 independent experiments; error bars represent 
s.e.m. Variations across the KIR3D family are shown underneath. 


within the interaction interface. In contrast, the polymorphisms 
within individual KIR3D gene families are largely at positions that 
are spatially separate from the binding site, a number of which are 
the subject of positive selection’. This suggests that other evolutionary 
pressures, such as pathogen-mediated immune evasion strategies, may 
drive KIR3D diversification at sites distant from the ligand-binding 
site. 


METHODS SUMMARY 

Protein expression and purification. Inclusion body preparations of the HLA- 
B*5701 heavy chain and B,-microglobulin were refolded and purified as detailed 
previously’. Residues 1-299 of KIR3DL1*001 were cloned into the pHLsec mam- 
malian expression vector with N-terminal 6XHis and_ secretion tags. 
KIR3DL1*001 was expressed from transiently transfected HEK 293S cells. 
Purified KIR3DL1*001 was then concentrated to 15mgml ' and deglycosylated 
with endoglycosidase H (New England Biolabs). 

Crystallization and data collection. The KIR3DL1*001-pHLA-B*5701 complex 
was crystallized and its structure determined. Further details are provided in 
Methods. 

Transfection studies. The sequence for a Flag tag (GACTACAAAGACGATGA 
CGACAAG) was added to the 5’ end of KIR3DL1*001 by primer addition and this 
cDNA was then cloned into a pEF6 vector. Specific nucleotide residues were 
mutated using the QuikChange II Site Directed Mutagenesis Kit (Stratagene) 
Plasmids were transfected into HEK293T cells using the FuGene 6 transfection 
reagent (Roche) according to the manufacturer’s instructions. After 48 h, the cells 
were harvested and stained with anti-Flag (clone M2, Sigma Aldrich) antibody or 
with tetramer for 30 min at 4°C. The cells were then washed and analysed on a 
Fortessa flow cytometer (BD Biosciences). 

Surface plasmon resonance. Surface plasmon resonance experiments were 
conducted at 25°C on a Biacore 3000 instrument using HBS buffer (10mM 
HEPES-HCI (pH 7.4), 150 mM NaCl and 0.005% surfactant P20 supplied by the 
manufacturer). Further details are provided in Methods. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Protein expression and purification. HLA-B*5701 and B,-microglobulin were 
expressed separately in E. coli from the pET-30 vector. Inclusion body prepara- 
tions of the HLA-B*5701 and B,-microglobulin were refolded and purified as 
detailed previously’. In brief, the resultant HLA class I complexes were purified 
by DEAE sepharose (Sigma) anion exchange chromatography using 10 mM Tris 
pH 8.0 and eluted with 150 mM NaCl. The protein was then further purified by gel 
filtration using an $200 16/60 column (GE Healthcare). The final purification step 
used anion exchange chromatography on a MonoQ column (GE Healthcare). The 
binary complex was concentrated in 10 mM Tris pH 8.0, 150 mM NaCl for use in 
crystallization trials and surface plasmon resonance (SPR) studies. The mutants of 
HLA-B*5701 were generated using the QuikChange PCR method (Stratagene) 
and purified as described earlier. 

Residues 1-299 of KIR3DL1*001 were cloned into the pHLsec mammalian 
expression vector with N-terminal 6X His and secretion tags. KIR3DL1*001 was 
expressed from transiently transfected HEK 2938 cells. Secreted KIR3DL1*001 
was harvested from the culture media 3 days after transfection by first dialysing the 
media against 10 mM Tris pH 8.0, 300 mM NaCl before the use of nickel affinity 
resin. The KIR3DL1*001 was eluted from the nickel resin with 10 mM Tris pH 8.0, 
300 mM NaCl, 50 mM EDTA. The protein was purified by gel filtration chromato- 
graphy using an S200 16/60 column (GE Healthcare) in 10mM Tris pH 8.0, 
300mM NaCl. Purified KIR3DL1*001 was then concentrated to 15mgml * 
and deglycosylated with endoglycosidase H (New England Biolabs.). The extent 
of deglycosylation was monitored by SDS-PAGE and this material was used in 
crystallization trials. For SPR studies a similar construct of KIR3DL1*001 was 
prepared in the pFastBac vector and expressed from Hi-5 insect cells 
(Invitrogen). The KIR3DL1*001 was purified as described earlier with the excep- 
tion that the endoglycosidase H deglycosylation step was not performed. 
Crystallization and data collection. The KIR3DL1*001-pHLA-B*5701 complex 
at 15mgml ' was crystallized at 294K by the hanging-drop vapour-diffusion 
method from a solution comprising 16% PEG 3350, 2% tacsimate pH 5 and 
0.1M tri-sodium citrate pH 5.6. The crystals typically grew to dimensions 
0.3 X 0.3 X 0.2 mm in 7 days. Before data collection, the crystals were equilibrated 
in crystallization solution with 35% PEG 3350 added as a cryoprotectant and then 
flash-cooled in a stream of liquid nitrogen at 100K. X-ray diffraction data were 
recorded on a Quantum-315 CCD detector at the MX2 beamline of the Australian 
Synchrotron. The data were integrated and scaled using DENZO and 
SCALEPACK from the HKL2000 program suite. Details of the data processing 
statistics are given in Supplementary Table 2. 

The final model comprises residues 6-261, 267-292 and there are three glyco- 
sylation sites located at Asn 71, Asn 158 and Asn 252. 

Structure determination and refinement. The structure was determined by 
molecular replacement using MOLREP. The search models used were the struc- 
tures of HLA-B*5701 and KIR2DL1 (PDB codes 2REX and 1IM9). The positions 
of the two complexes in the asymmetric unit were found in an incremental manner. 


The orientation of the first HLA molecule was found and subsequently the position 
of the D1 and D2 domains of the KIR receptor were placed. The second complex 
was fitted by application of the pseudo-translation vector 0.0, 0.5, 0.5. 
Refinement of the model was carried out in REFMAC with strict twofold 
non-crystallographic symmetry (NCS) applied. Structure building proceeded with 
iterative rounds of manual building in COOT and refinement in REFMAC. The 
DO domain of KIR3DL1*001 was manually built from the resultant electron 
density maps. The NCS restraints were removed for the final rounds of refinement. 
Solvent was added with COOT and the structure validated with MOLPROBITY”. 
The final structure comprises two KIR3DL1*001—pHLA-B*5701 complexes in the 
asymmetric unit, the association of which did not indicate higher-order oligomeric 
assemblies within the crystal lattice. The final refinement values are summarized in 
Supplementary Table 2. The crystals contained two virtually indistinguishable 
ternary complexes within the asymmetric unit, so structural analyses were con- 
fined to one KIR3DL1*001-—pHLA-B*5701 complex. 
Transfection studies. The sequence for a Flag tag (GACTACAAAGACGATGA 
CGACAAG) was added to the 5’ end of KIR3DL1*001 by primer addition and this 
cDNA was then cloned into a pEF6 vector. Specific nucleotide residues were 
mutated using the QuikChange II Site Directed Mutagenesis Kit (Stratagene) 
according to the manufacturer’s instructions using PAGE-purified primers. 
Sequences were verified by direct sequencing. These constructs were introduced 
into HEK293T cells using FuGene 6 transfection reagent (Roche) according to the 
manufacturer’s instructions. After 48 h, the cells were harvested and stained with 
anti-Flag (clone M2, Sigma Aldrich) antibody or with tetramer for 30 min at 4 °C. 
The cells were then washed and analysed on a Fortessa flow cytometer (BD 
Biosciences). Analysis of cell surface expression as assessed by staining with 
anti-Flag monoclonal antibody showed that the introduction of the mutations 
had no substantial effect on expression (data not shown). All transfection data 
are representative of two independent experiments. 
SPR. SPR experiments were conducted at 25 °C on a Biacore 3000 instrument using 
HBS buffer (10 mM HEPES- HCI (pH 7.4), 150 mM NaCl and 0.005% surfactant P20 
supplied by the manufacturer). The HLA class I-specific antibody W6/32 was immo- 
bilized on a CM5 chip via amine coupling according to manufacturer’s instructions. 
The pHLA complexes, and mutants thereof, were captured by W6/32 creating a 
surface density of approximately 500-1,000 resonance units. Various concentrations 
of KIR3DL1*001 (2.37 to 300 1M) were injected over the captured pHLA at 5 pl 
min’. The final response was calculated by subtracting the response of W6/32 alone 
from the KIR3DL1*001-pHLA-B*5701 complex. The equilibrium data were ana- 
lysed using GraphPad Prism. The shortened form of HLA-B*5701 comprised Gly- 
Gly-Gly in place of residues 14-19; in the long form of HLA-B*5701, Gly-Gly-Gly 
was inserted after Gly 16. For the SPR experiments, data are representative of two 
independent experiments with error bars representing s.e.m. of the duplicates. 


29. Davis, |.W. etal. MolProbity: all-atom contacts and structure validation for proteins 
and nucleic acids. Nucleic Acids Res. 35, W375-W383 (2007). 
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Rad51 paralogues Rad55-Rad57 balance the 
antirecombinase Srs2 in Rad51 filament formation 


Jie Liu’, Ludovic Renault”, Xavier Veaute®, Francis Fabre, Henning Stahlberg** & Wolf-Dietrich Heyer’? 


Homologous recombination is a high-fidelity DNA repair pathway. 
Besides a critical role in accurate chromosome segregation during 
meiosis, recombination functions in DNA repair and in the recovery 
of stalled or broken replication forks to ensure genomic stability. In 
contrast, inappropriate recombination contributes to genomic 
instability, leading to loss of heterozygosity, chromosome rearrange- 
ments and cell death. The RecA/UvsX/RadA/Rad51 family of 
proteins catalyses the signature reactions of recombination, homo- 
logy search and DNA strand invasion’’. Eukaryotes also possess 
Rad51 paralogues, whose exact role in recombination remains to 
be defined’. Here we show that the Saccharomyces cerevisiae 
Rad51 paralogues, the Rad55-Rad57 heterodimer, counteract the 
antirecombination activity of the Srs2 helicase. The Rad55-Rad57 
heterodimer associates with the Rad51-single-stranded DNA fila- 
ment, rendering it more stable than a nucleoprotein filament con- 
taining Rad51 alone. The Rad51-Rad55-Rad57 co-filament resists 
disruption by the Srs2 antirecombinase by blocking Srs2 transloca- 
tion, involving a direct protein interaction between Rad55-Rad57 
and Srs2. Our results demonstrate an unexpected role of the Rad51 
paralogues in stabilizing the Rad51 filament against a biologically 
important antagonist, the Srs2 antirecombination helicase. The 
biological significance of this mechanism is indicated by a complete 
suppression of the ionizing radiation sensitivity of rad55 or rad57 
mutants by concomitant deletion of SRS2, as expected for biological 
antagonists. We propose that the Rad51 presynaptic filament is 
a meta-stable reversible intermediate, whose assembly and dis- 
assembly is governed by the balance between Rad55-Rad57 and 
Srs2, providing a key regulatory mechanism controlling the ini- 
tiation of homologous recombination. These data provide a para- 
digm for the potential function of the human RAD51 paralogues, 
which are known to be involved in cancer predisposition and human 
disease. 

Rad51 protein and its homologues RecA, UvsX and RadA form 
nucleoprotein filaments with ssDNA that perform homology search 
and DNA strand invasion during homologous recombination. The 
Rad51 paralogues share the RecA core with the Rad51 protein featur- 
ing unique amino- and carboxy-terminal extensions (Supplementary 
Fig. 2), but themselves do not form filaments and are unable to per- 
form homology search and DNA strand invasion’ *. Whereas humans 
contain five paralogues (RAD51B, RAD51C, RAD51D, XRCC2, 
XRCC3), the budding yeast Saccharomyces cerevisiae contains two 
clearly identifiable paralogues, Rad55 and Rad57 (Supplementary 
Fig. 2). Rad55 and Rad57 in yeast as well as the five human RAD51 
paralogues have unique non-redundant functions in recombination, 
and mutations in any one of them lead to recombination defects, 
chromosomal instability, sensitivity to DNA damage, and meiotic 
defects’ °. Defects in the budding yeast RAD55 and RAD57 genes lead 
to identical and epistatic phenotypes in DNA repair and recombina- 
tion, consistent with the formation of a stable Rad55-Rad57 hetero- 
dimer*°. Rad55-Rad57 heterodimers were inferred to function as 


mediator proteins® allowing assembly of the Rad51 nucleoprotein fila- 
ment on ssDNA covered by the eukaryotic ssDNA-binding protein 
RPA*. This suggested that Rad55-Rad57 are involved in the nucleation 
of the Rad51 filament, which is otherwise inhibited on RPA-covered 
ssDNA. This nucleation model is akin to the role of RecCFOR or BRCA2 
in nucleating RecA or human RAD51 filaments”’. Rad51 filament 
formation in vivo can be monitored cytologically as Rad51 focus 
formation at the site of DNA damage’. Unexpectedly, Rad51 focus 
formation after ionizing radiation in yeast was demonstrated to be 
independent of Rad55-Rad57 and formation of visible Rad55- 
Rad57 foci required Rad51 (ref. 10). These results are difficult to 
reconcile with the nucleation model derived from the biochemical 
results and suggest an alternative function of Rad55-Rad57 in vivo. 
To address the function of the Rad51 paralogues in yeast, we deter- 
mined the effect of Rad55-Rad57 on the stability of Rad51-ssDNA 
nucleoprotein complexes. Deletion mutants of the RAD55 or RAD57 
genes display a curious enhancement of some phenotypes at low 
temperature (in particular ionizing radiation sensitivity; see Sup- 
plementary Fig. 12)°, indicating that these proteins are involved 
in the stabilization of a molecular complex, probably the Rad51 pre- 
synaptic filament. To test this hypothesis, we incubated subsaturating 
amounts of Rad51 protein with ssDNA (1 Rad51 per 15 nucleotides) in 
the presence of substoichiometric amounts of Rad55-Rad57 hetero- 
dimer (1 Rad55-Rad57 per 4 Rad51) and challenged the filaments with 
buffer containing a high salt concentration (500 mM NaCl) (Sup- 
plementary Fig. 3a, b). Under these conditions, Rad51 does not main- 
tain stable complexes with ssDNA during electrophoresis. However, 
the presence of Rad55-Rad57 resulted in stable, Rad51-containing 
ssDNA complexes that withstood the salt challenge. In a comple- 
mentary approach, we examined the effect of Rad55-Rad57 on Rad51 
filament formation at near-physiological ionic strength (90 mM NaCl) 
(Fig. 1a, b). Under these conditions, only a fraction of the available 
Rad51 binds ssDNA, causing retarded mobility of the DNA (Fig. 1b, 
lane 3). Addition of substoichiometric amounts of Rad55—-Rad57 
(1 Rad55-Rad57 per 6 Rad51 in lane 4 of Fig. 1b) led to the formation 
of a novel, supershifted complex that contained both Rad51 and 
Rad55-Rad57, as demonstrated by immunoblotting. Rad55-Rad57 
heterodimer alone binds to DNA under these conditions, leading to 
the formation of protein networks that are too large to enter the gel 
(Fig. 1b, lane 2). The results from both experiments (Fig. 1b; Sup- 
plementary Fig. 3) indicate that Rad55-Rad57 form a co-complex with 
Rad51 on ssDNA and stabilize Rad51-ssDNA filaments. Indeed, 
immunogold electron microscopy targeted towards Rad55 (glutathione 
S-transferase (GST)-tag; see Fig. 1c) directly visualized Rad55 asso- 
ciated with the Rad51-ssDNA filaments (Fig. 1d). Control experiments 
demonstrated the specificity of the gold labelling (Supplementary 
Table 1) with over 90% of the gold particles associated with clearly 
identifiable Rad51 filaments. The remainder may have associated 
with filaments too short to be scored or with free Rad55-Rad57. 
Gold particles were found either at the filament terminus (mn = 40) or 
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Figure 1 | Rad55-Rad57 is associated with and stabilizes Rad51-ssDNA 
filaments. a, Rad51-ssDNA filament assembly assay. EMSA, electrophoretic 
mobility shift assay. b, Rad51 (0.67 uM) with or without Rad55-Rad57 

(0.11 LM) was incubated with 4 1M 6X174 ssDNA. The migration position of 
free protein was confirmed in controls lacking DNA (Supplementary Fig. 3c). 
IB, immunoblot. c, Reaction scheme of immunoaffinity gold labelling of Rad55. 
d, Electron microscopy images of gold-labelled Rad55 associated with Rad51- 
ssDNA filament (1:3 Rad51:nucleotide; 2.34 uM Rad51 + 0.43 uM Rad55- 
Rad57, 7 uM ssDNA). Scale bars, 100 nm. e, Models for the disposition of 
Rad55-Rad57 with the Rad51 filament. For simplicity, only model 2 is drawn in 
further illustrations. 


interstitially (n = 43) (Supplementary Table 1). Negative controls with 
Rad51 filaments assembled in the absence of Rad55-Rad57 showed 
negligible gold labelling (Supplementary Table 1). These data show that 
Rad55-Rad57 are associated with the Rad51-ssDNA filament, but the 
exact disposition of the heterodimer with the filament remains to be 
determined (see Fig. le). 

Salt stability of protein-DNA complexes is a valuable biochemical 
criterion. To establish biological significance, we tested whether 
Rad55-Rad57 heterodimers stabilize Rad51-ssDNA filaments against 
a biologically relevant destabilizer. The Srs2 helicase was identified as a 
negative regulator of homologous recombination, and genetic experi- 
ments indicated that Srs2 targets Rad51 protein''"’. Consistent with 
the genetic data, Srs2 translocates on ssDNA and disrupts Rad51 pre- 
synaptic filaments in vitro, providing a compelling mechanism for its 
function as an antirecombinase’*"'*. In the presence of 0.1 or 0.33 uM 
Srs2 approximately 70% of the Rad51 is dissociated as assessed by 
measuring Rad51 associated with ssDNA coupled to magnetic beads 
(Fig. 2a-c). The presence of substoichiometric amounts of Rad55- 
Rad57 (0.1 uM) enhanced the recovery of ssDNA-bound Rad51 by 
~twofold (from 31% to 60% in the presence of 0.33 1M Srs2). 
Rad55-Rad57 and Srs2 bound to Rad51-covered ssDNA in a quan- 
titative and concomitant manner (Fig. 2d). Together the data show 
that Rad55-Rad57 inhibit Srs2 when bound to DNA and not in solu- 
tion. Concentration-dependent inhibition of Srs2-mediated dissoci- 
ation of Rad51 from ssDNA by Rad55-Rad57 was also observed in a 
topology-based assay (Supplementary Figs 4, 5). 

To investigate the role of Rad55-Rad57 in antagonizing disrup- 
tion of Rad51 presynaptic filaments by Srs2 further, we used electron 
microscopy to examine nucleoprotein filaments directly (Fig. 3 and 
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Figure 2 | Rad55-Rad57 stabilize Rad51-ssDNA filaments to resist 
disruption by Srs2. a, Pull-down assay measuring stability of Rad51-ssDNA 
complexes (1:3 Rad51:nucleotide, 1 uM Rad51 + 0.1 tM Rad55-Rad57) 
against disruption by Srs2 (0.1 or 0.33 1M). b, Rad51 remaining bound to 
ssDNA. ¢, Quantification of results in b and additional experiments. Shown are 
means + 1s.d., n = 3. d, Concomitant binding of Rad55-Rad57 and Srs2 to 
Rad51-covered ssDNA. Pull-down assay measuring stability of Rad51-ssDNA 
complexes (1:3 Rad51:nucleotide, 1 uM Rad51 + 0.2 1M Rad55-Rad57) 
against disruption by 0.33 UM Srs2. Top, pull-downs; bottom, supernatants. 


Supplementary Fig. 6). Rad51 filaments were assembled on a 600- 
nucleotide fragment of ssDNA and RPA was added to visualize free 
ssDNA. Consistent with previous observations'*"’, in the absence of 
Rad55-Rad57 Srs2 disrupts the Rad51-ssDNA filament efficiently, 
leading to binding of RPA to the newly exposed ssDNA (Fig. 3). 
Importantly, when substoichiometric amounts of Rad55-Rad57 were 
incubated with Rad51 and ssDNA, the filaments were stabilized 
against disruption by Srs2, as indicated by the significantly increased 
mean filament length. 

How do Rad55-Rad57 heterodimers block Srs2 from dissociating 
Rad51 from ssDNA? Srs2 is known to interact with Rad51 and trigger 
the Rad51 ATPase leading to dissociation of Rad51 from ssDNA". We 
found that Rad55-Rad57 form a 1:1 complex with Srs2 (Fig. 4a) and 
have higher affinity to Srs2 than to Rad51 (Fig. 4b, c). Excess Rad51 
does not compete with Srs2 binding to Rad55-Rad57 (Supplementary 
Fig. 7). Moreover, Rad55-Rad57 heterodimers are able to simulta- 
neously bind Rad51 and Srs2 in a 1:1:1 stoichiometry (Fig. 4d and 
Supplementary Figs 7-9). We considered the possibility that Rad55- 
Rad57 inhibit the Srs2 ATPase activity and by that Srs2 translocation, 
but Srs2 ATPase activity is barely altered by the presence of Rad55- 
Rad57 (data not shown). Srs2 translocase/helicase activity is stimu- 
lated by Rad51 binding to DNA” (Fig. 4e-g). Importantly, Rad55- 
Rad57 completely suppress this stimulatory effect of Rad51, leading to 
inhibition of the Srs2 helicase activity even at a fivefold molar excess of 
Srs2 over Rad55-Rad57 (Fig. 4f, g and Supplementary Fig. 10). This 
substoichiometric action of Rad55-Rad57 eliminates the possibility 
that Rad55-Rad57 inhibition functions by binding Srs2 in solution. 
Rad55-Rad57 only slightly inhibit Srs2 helicase in the absence of 
Rad51 (Fig. 4g and Supplementary Fig. 10c). Control experiments 
show that this effect depends on Srs2 translocating in the expected 
3’ to 5’ direction (Supplementary Fig. 10d), showing that Rad55- 
Rad57 inhibit Srs2 translocation on DNA to increase filament stability 
(Figs 1 and 2) and function (Supplementary Fig. 11). Direct visualiza- 
tion of human RADS51 filaments revealed that RAD51 is only able to 
form discontinuous short clusters on double-stranded DNA, asa result 
of frequent nucleation but limited extension’*””. If this property holds 
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Figure 3 | Rad55-Rad57 inhibit disruption of Rad51 presynaptic filaments 
by Srs2. a, RPA-ssDNA complex. b, Short (145 nm) Rad51-ssDNA filament. 
c, Long (350 nm) Rad51-ssDNA filaments. d, Quantification of electron 
microscopic analysis. For each reaction condition 300-400 filaments were 
analysed (2.34 uM Rad51, 7 UM 600-nucleotide ssDNA, + 0.43 UM Rad55- 
Rad57, + 0.21 uM RPA, + 0.4UM Srs2), and the means (9) + 1 s.d. and 
distributions of filament length classes are shown. Scale bars, 100 nm. White 
arrows indicate RPA-ssDNA complexes and red arrows indicate Rad51 
filaments. 


true for ssDNA, the formation of a co-filament with Rad51 by Rad55- 
Rad57 might provide a mechanism to form extended Rad51 filaments. 
This could also explain the increase in Rad55-Rad57 focus intensity 
over time after ionizing radiation exposure, and is consistent with the 
dependence of Rad55-Rad57 foci on Rad51 (ref. 10). 

Our biochemical data are consistent with a model (Supplementary 
Fig. 1) whereby Rad51 presynaptic filament formation is modulated by 
a balance between the stabilizing function of Rad55-Rad57 and the 
destabilizing function of Srs2 antirecombinase. This model predicts that 
a deletion of SRS2 should suppress the phenotypes caused by defects in 
Rad55-Rad57. In fact, srs24 completely suppresses the ionizing radi- 
ation sensitivity of rad57 and rad55 mutations in quantitative survival 
assays (Supplementary Fig. 12), consistent with semiquantitative results 
using rad57 (ref. 20). However, srs2A only mildly suppresses the methyl 
methanesulphonate sensitivity (Supplementary Fig. 13) and recom- 
bination defect (Supplementary Fig. 14) ofa rad55 mutation, consistent 
with previous rad57 data*®. The difference in suppression is likely 
related to a difference in substrates: ionizing radiation-induced DNA 
damage requires primarily double-strand break repair, whereas methyl 
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Figure 4 | Rad55-Rad57 interact with Srs2 and inhibit Srs2 helicase. a, Pull- 
down with 4nM (1 pmol) Rad55-Rad57 and 2.7, 8, or 16 nM Srs2. b, Pull-down 
with 4nM Rad55-Rad57 and 8 or 16nM Rad51. c, Quantification of results in 
a and b and additional experiments. d, Pull-down with 4nM Rad55-Rad57 and 
8nM Srs2 + 40 nM Rad51. GST was used as control. S, supernatant; W, wash; E, 
eluate. e, Helicase assay. f, Rad51 (28 nM) with or without Rad55-Rad57 (25 nM) 
were incubated with 1.5nM 3’-tailed substrate before addition of 120nM Srs2 
protein. g, Product yields at 20 min. Means + 1s.d., n = 3, are shown. 


methanesulphonate-induced DNA damage and sister chromatid 
recombination require gap repair (Supplementary Fig. 1). We propose 
that the Rad51 presynaptic filament is a meta-stable reversible inter- 
mediate, whose dynamics in yeast are partially controlled by the balance 
of the filament-stabilizing activity of Rad55-Rad57 and the filament- 
destabilizing activity of the Srs2 helicase (Supplementary Fig. 1). This 
balance is likely to be influenced by the multiple post-translational 
modifications that have been identified to regulate Rad55-Rad57 
(ref. 21) and Srs2 (ref. 22) functions (Supplementary Fig. 1)’. 
Together with the local availability of SUMO-PCNA, which specifically 
recruits Srs2 (refs 7*-*°), post-translational modifications may deter- 
mine the balance between recombination and antirecombination in 
wild-type cells and explain the various degrees of suppression observed 
in the srs2 rad55 (rad57) double mutants that depend on the type of 
DNA damage or genetic endpoint (double-strand break versus replica- 
tion-fork-associated gap in Supplementary Fig. 1). 

The human RAD51 paralogues have important roles in tumour 
suppression and human disease*”®. Our studies established an unpre- 
cedented mechanism of anti-antirecombination that may serve as a 
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paradigm for the mechanism of action of the five human RAD51 
paralogues. The diversification of the human RAD51 paralogues 
may reflect the multiplicity of human motor proteins that may disrupt 
RADS51 presynaptic filaments, including the RecQ-like helicases BLM 
and RECQLS5 as well as FBH1 and FANCJ*’* or indicate additional 
functions during recombinational repair. 


METHODS SUMMARY 


Purification of yeast Rad51, Rad55-Rad57, RPA and Srs2, the biochemical assays 
and the electron microscopy analysis are detailed in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Protein purification. Yeast Rad51, RPA and Srs2 proteins were purified as 
described'**'. The purification of Rad55-Rad57 was adapted from a previously 
published protocol’. Yeast cells overexpressing GST-Rad55-Hisg—Rad57 were 
grown and harvested as described’. Cells were disrupted in Buffer B containing 
10mM Na,HPO,, 1.8 mM KH2POu,, 2.7mM KCl, 1M NaCl, 10% (v/v) glycerol, 
10mM £-mercaptoethanol and protease inhibitor cocktail (1 mM PMSF, 2 1M 
leupeptin, 1 1M pepstatin A and 1 mM benzamidine) using glass beads (0.5 mm 
glass beads; BioSpec Products, Inc.). The cell lysate was centrifuged at 40,000 r.p.m. 
for 45 min using a Ti50.2 rotor. The supernatant was collected and loaded onto a 
pre-equilibrated Glutathione Sepharose 4B column (GE Healthcare). After 
washing with buffer A (20 mM Tris-HCl pH 7.5, 1mM EDTA, 1 M NaCl, 5mM 
B-mercaptoethanol and 10% glycerol), the GST-tagged proteins were eluted with 
Buffer A containing 20 mM reduced glutathione plus protease inhibitor cocktail. 
Fractions containing the GST-Rad55-Hiss-Rad57 heterodimer, as determined by 
10% SDS-PAGE, were pooled and dialysed against Buffer C (50 mM NaH2PO, 
pH8.0, 1M NaCl and 10% glycerol) containing the protease inhibitor cocktail. 
Then the pool was loaded onto a pre-equilibrated Ni-NTA agarose column 
and washed with Buffer C plus protease inhibitor cocktail. The bound complexes 
were eluted with Buffer C containing 0.5M NaCl, 0.1mM PMSF and 250mM 
imidazole, and analysed by 10% SDS-PAGE. Fractions containing stoichiometric 
Rad55-Rad57 heterodimer were pooled, concentrated, dialysed into the storage 
buffer containing 20mM Tris-HCl pH7.5, 0.5M NaCl, 0.1 mM EDTA, 1mM 
DTT and 10% glycerol, and then stored in aliquots at —80 °C. The absence of 
contaminating enzymatic activities and DNA in protein preparations was verified 
as described”. 

Purification of 600-nucleotide ssDNA. As published previously'’, 600-bp 
dsDNA fragments biotinylated at a one 5’ end were generated by PCR from 
PstI-linearized 6X174 DNA using primers WDHY427 5’-TTATCGAAGCGCGC 
ATAAAT-3’ and 5’ biotinylated WDHY431 5’-GTCTTCATTTCCATGCGG 
TG-3'. The biotinylated dsDNA was loaded onto a HiTrap Streptavidin HP 
column (Amersham Biosciences), and non-biotinylated single-stranded DNA 
was eluted with 60 mM NaOH. 

Rad51-ssDNA filament assembly assay. In Fig. 1b, Rad51 (0.67 LM) was incu- 
bated with 4 1M ssDNA, in the presence or absence of 0.11 uM Rad55-Rad57, in 
buffer R containing 20 mM triethanolamine pH7.5, 4mM magnesium acetate, 
2.5mM ATP, 25 pgml | BSA, 1mM DTT, 90mM NaCl and 5% glycerol for 
10 min. Then 0.25% glutaraldehyde was used to crosslink the protein-DNA com- 
plexes for 15 min. The complexes were separated on a 0.5% agarose gel, stained 
with SYBR Gold, transferred to nitrocellulose membrane, and blotted with rabbit 
polyclonal anti-Rad51 or anti-Rad55 antibodies. 

Rad51-ssDNA filament salt challenge assay. In Supplementary Fig. 3, Rad51 
(0.267 14M) was incubated with 411M ssDNA, in the presence or absence of 
0.067 |tM Rad55-Rad57, in buffer R containing 20 mM triethanolamine pH 7.5, 
4mM magnesium acetate, 2.5mM ATP, 25g ml! BSA, 1mM DTT and 5% 
glycerol for 10 min. Then 5 M stock NaCl solution was added to the reaction to 
reach a final concentration of 500mM for a further incubation of 30 min. 
Glutaraldehyde (0.25%) was used to crosslink the protein-DNA complexes for 
15 min. Complexes were separated on a 0.5% agarose gel and stained with SYBR 
Gold. Proteins were transferred to nitrocellulose membrane and blotted with anti- 
Rad51 antibodies. All DNA concentrations refer to nucleotides (ssDNA) or base 
pairs (dsDNA). 

Protein binding to ssDNA immobilized on magnetic beads. In Fig. 2b, a 
5'-biotinylated oligonucleotide was immobilized onto magnetic streptavidin 
beads as previously described™*. The oligo sequence is 5’-CCCCCCCCCCCCCA 
AGATAATTTTTCGACTCATCAGAAATATCCGAAAGTGTTAACTTCTGCG 
TCATGGAAGCGATAAAACTC-3’. In experiments containing Srs2, 10-l slurry 
of beads containing 3 |tM ssDNA was incubated with 1 1M Rad51 in the presence 
and absence of 0.1 tM Rad55-Rad57 in buffer containing 20 mM triethanolamine, 
5mM magnesium acetate, 4mM ATP, 25 pg ml ' BSA, 1mM DTT, 5% glycerol 
and 25 mM NaCl for 10 min at 22 °C. Then 0.1 or 0.33 1M Srs2 protein was added 
and further incubated for 10 min. The beads were washed, and bound proteins 
were eluted and quantified as described**. Background protein binding was 
typically less than 3%. 

Topology-based assay for Rad51 dissociation. A published protocol was modified 
slightly for this assay using M13mp18 ssDNA*. In Supplementary Fig. 4, 375 nM 
Rad51, with 0, 80 and 120nM Rad55-Rad57, were incubated with 9 uM circular 
M13mp18 ssDNA in 25 ul of buffer containing 20mM triethanolamine pH7.5, 
4mM magnesium acetate, 25 tgml ’ BSA, 1mM DTT, and an ATP-regenerating 
system consisting of 2.5mM ATP, 20U ml ! creatine kinase, and 20 mM creatine 
phosphate for 10 min at 30 °C. Then, 100 nM Srs2 and 150nM RPA were added 
and incubated for 10 min, before the addition of topologically relaxed pUC19 
dsDNA (7 uM in base pairs) and wheat germ DNA topoisomerase I (3 U). After 
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another 10 min incubation, reactions were stopped by addition of 4 tl stop solution 
consisting of 1% (w/v) SDS, 75mM EDTA, 10 mg ml! protease K and further 
30 min incubation at 37 °C. DNA species were resolved by electrophoresis on a 1% 
TBE-agarose gel and visualized using ultraviolet transillumination after ethidium 
bromide staining. The results were quantified using ImageQuant. 

Protein interaction assays. GST-Rad55-His,—Rad57 (4nM) or 30nM GST (GE 
Healthcare) were incubated with increasing amounts of either Srs2 or Rad51 in 
buffer P containing 25 mM Tris-HCl (pH 7.5), 10 mM magnesium acetate, 50 mM 
NaCl, 1mM DTT, 10% glycerol and 0.05% NP-40 for 1h at room temperature 
(Fig. 4a-d). Equilibrated and BSA-treated Glutathione-Sepharose 4B beads were 
added to the mixture and incubated for 1h. The beads and supernatant were 
separated by centrifugation and the beads were washed twice with binding buffer 
P. The pulled-down protein complexes were eluted by boiling at 95 °C for 3 min in 
10 pl SDS-PAGE loading buffer, separated through a 10% SDS-PAGE gel, and the 
protein bands were visualized through immunoblots and quantified by 
ImageQuant. In Fig. 4a, b, 1/16th of the supernatant and wash were loaded. In 
Fig. 4d, 1/7th of the supernatant and wash were loaded. For the competition protein 
binding assay (Supplementary Figs 7 and 8), the two proteins were incubated for 
30 min before the addition of an increasing amount of the third challenging protein, 
as specified in the diagrams. After another 30-min incubation, equilibrated and 
BSA-treated Glutathione-Sepharose 4B beads were added to the mixture and incu- 
bated for 1h. Analysis and quantification was performed as described above. The 
anti-Rad51, -Rad55 and -Rad57 antibodies were generated in rabbits, the anti-Srs2 
antibody was purchased from Santa Cruz Biotechnology. 

Helicase assay. The assay followed a published protocol and the substrates were 
prepared exactly as described before'®. In Fig. 4f, 28nM Rad51 with 0 or 25nM 
Rad55-Rad57 were incubated with 1.5 nM oligo substrate with 3’ tail in buffer 
containing 20 mM triethanolamine pH 7.5, 4mM magnesium acetate, 25 1g ml 
BSA, an ATP-regenerating system consisting of 2.5mM ATP, 20 U ml ' creatine 
kinase and 20 mM creatine phosphate, as well as either 1 mM DTT and 40 mM 
NaCl (Fig. 4e, g and Supplementary Fig. 10c, d) or 5mM DTT and 10mM NaCl 
(Supplementary Fig. 10a) for 10 min at 30°C. Then 120nM Srs2 protein was 
added to initiate the helicase reaction. After 20 min incubation, the reactions were 
stopped by adding 4.5 il stop buffer containing 150 mM EDTA, 2% SDS, 163 nM 
unlabelled oligo, and 4.3 mg ml ' protease K into 9 ul reaction sample. The DNA 
species were separated through electrophoresis on a 10% TBE-PAGE gel, which 
was dried and analysed by a Storm phosphorimager. The bands were quantified by 
densitometry using ImageQuant. 

DNA strand exchange assay. In Supplementary Fig. 11, Rad51 (3.3 uM) was 
incubated with 0.3 1M Rad55-Rad57 or the corresponding amount of Rad55- 
Rad57 storage buffer and 10 uM X174 ssDNA for 15min at 30°C in buffer 
containing 30mM Tris-acetate (pH7.5), 4mM magnesium acetate, 75mM 
NaCl, 1mM DTT, 2.5mM ATP, 50 pg ml! BSA, 20mM phosphocreatine and 
80ng pl | Creatine kinase. 0.56 uM RPA and 0, 333, 222, 167, 125 nM of Srs2 were 
added, and incubated for another 30 min. Then 10 1M (bp) PstI-linearized 6X174 
dsDNA and 4.8 mM spermidine were added and further incubated for 120 min. 
Samples were deproteinized and separated by electrophoresis on a 0.8% TBE- 
agarose gel. Images were recorded using a FluorChem8900 imaging system 
(Alpha Innotech) after staining with SYBR-Gold (Invitrogen), and quantified with 
ImageQuant. Percentage of joint molecule (JM) was calculated according to the 
equation JM% = (JM/1.5)/(JM/1.5 +NC+ dsDNA). Percentage of product 
formation was calculated according to the equation product% = (JM/ 
1.5 + NC)/(JM/1.5 + NC + dsDNA). NC, nicked circle. 

Electron microscopy. To assemble the protein-DNA filament, 2.34 1M Rad51 
protein, in the presence or absence of 0.43 uM Rad55-Rad57, was incubated with 
7 uM 600-nt ssDNA (+) strand for 10 min at 30°C in 20 mM triethanolamine 
pH7.5, 4mM magnesium acetate, 1 mM DTT and 3mM ATP. RPA (0.21 1M) 
was added and incubated for another 10 min. Lastly, 0.4 1M Srs2 or buffer control 
was added and incubated for 10 min. The reaction mixtures were diluted 20-fold in 
10 mM Tris-HCl pH 7.5, 50 mM NaCl and 5mM MgCl, without chemical fixa- 
tion. The samples were adsorbed onto 400 mesh carbon-coated copper grids (Ted 
Pella), negatively stained with 2% (w/v) uranyl acetate, blotted, and air-dried. 
Grids were imaged in a JEOL JEM-1230 transmission electron microscope 
(JEOL). Images of Rad51-filaments were randomly collected from different areas 
on the grid. 6-10 grids were used for each condition. Images were recorded at a 
nominal magnification of X40,000 under minimum dose procedures on a Tietz 
2,048 X 2,048 pixel CCD camera (TVIPS, Germany). Immunoaffinity gold label- 
ling of GST-Rad55, as shown in Fig. 1d, was adapted from a published protocol’’. 
In brief, Rad51-Rad55-Rad57-ssDNA complexes were assembled as described 
above and crosslinked with 0.25% glutaraldehyde for 20 min, before deposition on 
grids. Grids were blocked in 50 ug ml’ BSA in TBST for 30 min, and then incu- 
bated with goat anti-GST antibody (GE Healthcare) for 30 min. After three 5 min 
washes with 50 1g ml’ BSA in TBST, the grids were incubated in TBST plus a 1:5 
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dilution of gold particles dressed with rabbit anti-goat antibody (BioAssay Works). 
After two 5 min washes in 50 1g ml_' BSA in TBST and one 5 min wash in 5 mM 
magnesium acetate, grids were stained with 2% (w/v) uranyl acetate before 
imaging. 

Saccharomyces cerevisiae strains. Strains used are listed in Supplementary Table 2. 
Recombination assay. Spontaneous recombination rates between direct repeats 
were determined following a published fluctuation analysis protocol using the 
method of the median***’. The direct-repeat recombination substrate has two 
different ade2 alleles separated by plasmid sequences and the URA3 gene”. 
Yeast strains were grown on YPD plates for 2 days at 30 °C for single colonies. 
For each strain, nine independent single colonies were randomly chosen and the 
entire colony was used to inoculate 4 ml YPD liquid culture. Liquid cultures were 
grown for 2-3 days at 30 °C to reach stationary phase. Cells were collected, washed 
with sterile HO, and suspended into 1 ml sterile HO. 100 pl of appropriate 
dilutions of each culture were spread on two plates each of SD-ADE-URA. Cells 
were incubated for 2 days at 30°C. For each culture, the number of colonies on 
YPD were counted and totalled to determine the total cell number. The number of 
colonies on SD-ADE-URA were counted to determine the median number of 
recombinants. For each strain, recombination rates were measured independently 
three times and the mean values with standard deviations are shown. 

MMS sensitivity assay. Yeast strains were grown overnight in liquid YPD to mid- 
log phase at 30 °C, and then diluted to OD¢00 nm = 1. Serial dilutions of these cell 
cultures were made with sterile H.O and spotted onto YPD plates with or without 
methyl methanesulphonate. Plates were incubated for 3 days at 30 °C or 5 days at 
22°C before photographing using a FluorChem8900 imaging system (Alpha 
Innotech). 

Ionizing radiation survival assay. Exponentially growing cells (1X 10’ to 
2 X 10’ per ml) in YPD medium at 28 °C were collected by centrifugation, washed 


in cold saline (0.9% NaCl), sonicated and resuspended in saline at the desired 
concentration. The cell suspension was -irradiated in a '*’Cs irradiator delivering 
20 Gy min '. Aliquots of appropriate dilutions were spread on YPD-containing 
plates pre-warmed at either 23°C or 34°C. The plates were incubated at the 
corresponding temperature for 4 days (34°C) or 6 days (23 °C) before counting 
the colonies. Platings were done in duplicate. The experiments were repeated at 
least three times, and the result of one typical assay is shown. 
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Resolving the evolutionary relationships of molluscs 


with phylogenomic tools 


Stephen A. Smith'*, Nerida G. Wilson*“, Freya E. Goetz’, Caitlin Feehery'’, Sonia C. S. Andrade®, Greg W. Rouse’, 


Gonzalo Giribet? & Casey W. Dunn! 


Molluscs (snails, octopuses, clams and their relatives) have a great 
disparity of body plans and, among the animals, only arthropods 
surpass them in species number. This diversity has made Mollusca 
one of the best-studied groups of animals, yet their evolutionary 
relationships remain poorly resolved'. Open questions have 
important implications for the origin of Mollusca and for morpho- 
logical evolution within the group. These questions include 
whether the shell-less, vermiform aplacophoran molluscs diverged 
before the origin of the shelled molluscs (Conchifera)*~* or lost 
their shells secondarily. Monoplacophorans were not included in 
molecular studies until recently”*®, when it was proposed that they 
constitute a clade named Serialia together with Polyplacophora 
(chitons), reflecting the serial repetition of body organs in both 
groups’. Attempts to understand the early evolution of molluscs 
become even more complex when considering the large diversity of 
Cambrian fossils. These can have multiple dorsal shell plates and 
sclerites’”-’° or can be shell-less but with a typical molluscan radula 
and serially repeated gills. To better resolve the relationships 
among molluscs, we generated transcriptome data for 15 species 
that, in combination with existing data, represent for the first time 
all major molluscan groups. We analysed multiple data sets con- 
taining up to 216,402 sites and 1,185 gene regions using multiple 
models and methods. Our results support the clade Aculifera, con- 
taining the three molluscan groups with spicules but without true 
shells, and they support the monophyly of Conchifera. Mono- 
placophora is not the sister group to other Conchifera but to 
Cephalopoda. Strong support is found for a clade that comprises 
Scaphopoda (tusk shells), Gastropoda and Bivalvia, with most ana- 
lyses placing Scaphopoda and Gastropoda as sister groups. This 
well-resolved tree will constitute a framework for further studies of 
mollusc evolution, development and anatomy. 

Since the first animal phylogenies based on molecular data, many 
researchers have struggled to resolve mollusc phylogenies even as 
taxon sampling improved**"? (see Fig. 1 for some hypotheses that have 
been proposed). Little support, if any, was found for the monophyly of 
Mollusca or most of its larger subclades. Better results were achieved 
for some internal relationships of these groups, including Polyplaco- 
phora, Bivalvia, Cephalopoda, Scaphopoda and Gastropoda, although 
often with difficulties recovering monophyly of the two largest clades, 
the gastropods and bivalves”'*'*. Unfortunately, fundamental ques- 
tions in mollusc evolution remain largely unanswered by the molecular 
and morphological data. These questions include whether the aplaco- 
phoran molluscs are monophyletic” or paraphyletic**. There has also 
been conflicting evidence for the placement of Polyplacophora, which 
has been placed with the aplacophorans (forming the clade Aculifera), 
as the sister group to the shelled molluscs (forming the clade Testaria) 
or as the sister group to Monoplacophora (forming the clade Serialia). 
In addition, many hypotheses have been proposed for the interrela- 
tionships of the conchiferan groups. The extensive fossil record of 


Mollusca (which dates back to the Cambrian), combined with the 
numerous Palaeozoic forms that are considered stem-group molluscs 
and the lack of resolution in targeted-gene approaches to molluscan 
phylogenetics, pointed towards a possible rapid radiation with little 
phylogenetic signal left in the genomes of molluscs. However, the same 
has been argued for the radiation of Metazoa in the Cambrian or 
earlier’’, but large increases in gene representation using phyloge- 
nomic analyses have clearly ameliorated this problem and identified 
relationships that seemed impossible to resolve with target-gene 
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Figure 1 | Selected hypotheses of extant molluscan relationships and 
relevant taxa. Phylograms based on the hypotheses of Scheltema’ (a), Salvini- 
Plawen and Steiner’ (b) and Waller’* (c). Most controversy centres on the 
monophyly of Aplacophora, the relationships within Conchifera and the 
placement of Polyplacophora (for example, in Aculifera versus in Testaria). 
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approaches’*’’. We applied the same principles to Mollusca, one of the 
most challenging problems to solve in animal phylogenetics. 

Only phylogenomic analyses have been able to recover molluscan 
monophyly with high support’; however, few molluscs were included 
in earlier analyses, and not all of the major subclades were represented. 
Therefore, little could be concluded about the interrelationships of the 
major molluscan groups. Morphology-based cladistic analyses have 
often relied on ‘idealized’ composite ground patterns to represent 
entire clades”, a practice that has now been largely replaced with 
the use of exemplar species’® and more detailed character descrip- 
tions. But an analysis of molluscan morphological features coding real 
exemplars has yet to be published, and the exemplar approach is much 
more amenable to molecular data. 

Analyses of our broadly sampled, new phylogenomic data (see 
Methods, Supplementary Table 1 and Supplementary Fig. 1) result in 
a well-resolved and highly supported phylogeny of Mollusca (Fig. 2), in 
contrast to all previous molecular attempts**'*. These results are con- 
sistent across analytical methods, phylogenetic inference programs, 
matrices that vary in occupancy and the number of genes considered 
(Fig. 2 and Supplementary Figs 2—6 and 9), and the inclusion of differ- 
ent outgroup taxa (Supplementary Fig. 7). 

Our results (Fig. 2) show a sister group relationship between the 


Polyplacophora as the sister group to the two aplacophoran groups 
(Neomeniomorpha and Chaetodermomorpha). This topology lends 
support to the idea that the vermiform Aplacophora are not plesio- 
morphic but are derived from plated Palaeozoic molluscs such as 
Acaenoplax'®. The aculiferans are characterized by spicules and dorsal 
shell plates. Chitons have eight dorsal shell plates, but their larva has an 
anlagen with seven rows of dorsal papillae, as observed in the serially 
arranged spiculoblasts of a chaetodermomorph larva”, a character that 
may constitute a synapomorphy of the clade. 

Conchifera is supported as a clade, suggesting that true shells may 
have originated only once, perhaps by the concentration of a diffuse 
shell gland into a single zone of the mantle (two zones in bivalves), at 
least as defined by the role of engrailed during organogenesis*'. The 
support here for Conchifera rejects the recent Serialia hypothesis”®. 
Comparing the site likelihoods in analyses in which Serialia is con- 
strained with those in which it is not constrained reveals that there 
are many more characters that are incongruent with Serialia than sup- 
port Serialia (Supplementary Fig. 8). Monoplacophora is not, however, 
the sister group of all other Conchifera, as has been suggested by most 
authors, and is instead the sister group to Cephalopoda, as has been 
proposed based on some palaeontological data**. Many palaeontologists 
have accepted the monoplacophoran ‘ancestry’ of Cephalopoda”, 


aculiferan molluscs and the conchiferan groups. Aculifera*’’ includes although this relationship has been rejected by neontologists, who 
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Figure 2 | Phylogram of the RAxML maximum likelihood analysis of the big 
matrix (216,402 amino acids) under the WAG+ /‘model. Support values for 
the topology obtained from four analyses are listed as percentages in the order 
A/B/C/D. A is the bootstrap support from RAxML analysis under the WAG 
model for the big matrix. B is the bootstrap from RAxML analysis under the 
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consider that cephalopods and gastropods share important morpho- 
logical features such as the presence of cephalic eyes, the isolation of the 
head from the visceral mass, the terminal position of the mantle cavity 
and the occurrence of muscle antagonistic systems*”’. The presence of 
multiseptate shells in fossil Hypseloconidae monoplacophorans, a 
character that is found in Nautilus and fossil cephalopods, has been 
interpreted as supporting this relationship between Cephalopoda and 
Monoplacophora™. The presence of two pairs of gills, kidneys and atria 
in the chambered Nautilus has been interpreted as an indication that 
secondary simplification took place during the early evolution of 
cephalopods from an ancestor with serially repeated structures”. 
This interpretation and the present trees suggest that the most recent 
common ancestor of Cephalopoda and Monoplacophora had some 
serially repeated structures. 

The internal resolution of Cephalopoda is in agreement with all of 
the current hypotheses, with the chambered Nautilus forming the 
sister group of Coleoidea, and also identifies the monophyly of 
Decapoda”*”®. Scaphopoda, Gastropoda and Bivalvia form a clade with 
thick multilayered shells, but this clade has received little attention in 
the literature”. Most morphological hypotheses place scaphopods as 
the sister group to bivalves in a clade named Diasoma*”’ and, recently, 
molecular and developmental data have favoured a cephalopod- 
scaphopod relationship. Although there is strong support for the 
placement of Scaphopoda as the sister group to Gastropoda in 
maximum likelihood analyses of the big matrix (Fig. 2 and Supplemen- 
tary Figs 2 and 4), maximum likelihood analyses of the small matrix 
recover this same relationship but with less support (Fig. 2 and Sup- 
plementary Figs 3 and 5). Bayesian analyses using the site-heterogeneous 
CAT model of protein evolution also place Scaphopoda as the sister 
group to Gastropoda, with a posterior probability of 89% (Supplemen- 
tary Fig. 9). 

Within Bivalvia, maximum likelihood analyses and Bayesian ana- 
lyses under the Whelan and Goldman (WAG) model support the 
monophyly of Protobranchia, which includes bivalves with plesio- 
morphic ctenidia—gills comparable to those of many other molluscs. 
This contradicts some earlier bivalve phylogenies, based on fewer data, 
that proposed paraphyly of protobranchs™ but supports the traditional 
morphological views”. Bayesian analyses with the CAT model are 
consistent with Protobranchia but do not provide strong support for it. 
The hypertrophied bivalve gill, which is responsible for filter feeding, 
had a single origin, and organisms with this type of gill constitute the 
well-recognized clade Autolamellibranchiata. Palaeoheterodonta (the 
group that includes freshwater pearl mussels) is the sister group to all 
other autolamellibranchiates, which can be divided into heterodonts 
and pteriomorphians. This hypothesis is similar to that proposed by 
some palaeontologists, although additional taxa, especially Neotrigonia, 
Anomalodesmata and Archiheterodonta, must be included before con- 
cluding more about the internal autolamellibranchiate relationships. 

Likewise, the internal relationships of Gastropoda, although still 
limited in their taxonomic representation (the group includes nearly 
100,000 living species), support some of the major divisions that are 
currently accepted”. The patellogastropod Lottia is either the sister 
group to Vetigastropoda (as in Thiele’s Archaeogastropoda hypo- 
thesis) or the sister group to all other gastropods”, depending on the 
data set that is analysed. The former alternative has been recovered in 
recent molecular analyses of gastropods”. The two representatives of 
Caenogastropoda form a sister clade to the representatives of 
Heterobranchia, including opisthobranchs and pulmonates, as sug- 
gested in all of the modern analyses of gastropod relationships’*”’. 

For the first time, our data and analyses resolve the broad-scale 
relationships within Mollusca with strong support. This allows us to 
gain an understanding not only of the relationships of modern molluscs 
but also of the numerous Palaeozoic forms of molluscs. It also allows us 
to investigate several key characters that define the group. Molluscs are 
related to other animals with spiral development and a trochophore 
larva and have now been shown to share a close ancestor with annelids 
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and brachiopods’®, both of which use chaetoblasts to produce chaetae. 
Spicules and chaetae may share a similar developmental mechanism”. 
Likewise, the appearance of dorsal plates or shells in addition to sclerites 
is now well documented in halwaxiids, Acaenoplax and Polyplacophora. 
These features are generated by multiple rows of secretory papillae in 
chiton and aplacophoran larvae. They may be plesiomorphic among 
molluscs, especially if halwaxiids are interpreted as stem-group mol- 
luscs, but they could also be apomorphic for Aculifera. The condensa- 
tion of such papillae into a single shell gland”' could be responsible for 
the origin of the conchiferan shell, arguably the single event that led to 
the extraordinary success of molluscs, first in the Cambrian oceans and 
later in many limnic and terrestrial environments. In addition to the 
presence of shell glands that can deposit calcium carbonate, the 
primitive mollusc may have had a rasping radula and serially repeated 
ctenidia along the mantle cavity, because both characters appear in the 
two lineages of extant molluscs, Aculifera and Conchifera, as well as in 
several extinct Palaeozoic stem molluscs. Like the arthropods, with 
their hardened exoskeletons, molluscs are true conquerors of our land 
and waters. 


METHODS SUMMARY 


New transcriptome data were collected for 14 mollusc species that had been selected 
to optimize taxonomic representation (Supplementary Table 1). Collecting efforts 
included an oceanographic campaign to collect members of the key taxon 
Monoplacophora. Using several protocols, messenger RNA was extracted, and 
cDNA samples were sequenced on a 454 Genome Sequencer FLX Titanium 
(Roche) or a Genome Analyzer IIx (Illumina). After assembly and translation, the 
sequences from all taxa were compared to each other with BLASTP. These pairwise 
comparisons were used to cluster genes into homologues using the algorithm MCL. 
The phylogenetic analyses divided sets of homologues into orthologues, which were 
aligned, trimmed and concatenated into two supermatrices that differed in the 
number of genes and the average fraction of genes available for each species. The 
‘small’ matrix consists of 301 genes that are present in at least 20 taxa. This matrix 
has 50% gene occupancy (that is, sequence data were available for an average of 50% 
of the genes across the taxa), 27% character occupancy (that is, 27% of the matrix 
consists of unambiguous amino acid data, with the remainder being missing data or 
alignment gaps) and is 50,930 sites in length. The ‘big’ matrix consists of 1,185 genes 
that are present in at least 15 taxa. This matrix has 40% gene occupancy, 21% 
character occupancy and is 216,402 sites long. Both matrices contain data for all 
of the 46 species that were included in the study. The matrices were analysed with the 
programs RAxML, MrBayes and PhyloBayes to infer relationships. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Taxon sampling and RNA isolation. The taxa were selected to optimize 
taxonomic representation within Mollusca. Collecting efforts included an oceano- 
graphic campaign to collect members of the key taxon Monoplacophora*'. New 
transcriptome data were collected for one outgroup taxon, Lingula anatina, and 
for 14 other taxa that were broadly sampled across Mollusca (Supplementary 
Table 1). All tissues were collected fresh and were prepared immediately or pre- 
served for subsequent RNA work. Stored tissue was frozen (at —80 °C) or added to 
RNAlater (and frozen at —80°C or —20°C). Total RNA was isolated with TRI 
Reagent (Invitrogen) and further cleaned up with an RNeasy kit (QIAGEN), 
including a DNase I digestion step. 

Sequencing. Samples were sequenced on a 454 Genome Sequencer FLX Titanium 
(Roche) or a Genome Analyzer IIx (GA IIx, Illumina). The sample preparation 
protocol and sequencing technology used for each sample is listed in 
Supplementary Table 1. 

All 454 samples were sequenced by 454 Life Sciences on one-eighth of a 
Titanium flow cell. For five of the 454 samples, RNA was sent to the sequencing 
facility for library preparation and sequencing according to the standard 454 
cDNA protocols (these samples are marked Roche in the Library Protocol column 
of Supplementary Table 1). Nautilus pompilius mRNA was enriched by one round 
of binding to Dynabeads (Invitrogen); for the other specimens, total RNA was sent 
to the sequencing centre, where mRNA enrichment was performed. For four of the 
454 samples, full-length cDNA was prepared according to a template-switching 
protocol” (these samples are marked TS in the Library Protocol column of 
Supplementary Table 1). Adaptors were modified to include restriction sites 
and were removed by cleavage before sequencing. An Mmel site was incorporated 
into the 3’ adaptor (5'-ATT CTA GAG CGC ACC TTG GCC TCC GAC TTT 
TCT TIT CTT TIT TTT TCT TTT TTT TTT VN-3’, where V and N are 
ambiguous nucleotides), and a Sfil site (5’-AAG CAG TGG TAT CAA CGC 
AGA GTG GCC ACG AAG GCC GGG-3’) or an AsiSI site (5'-AAG CAG 
TGG TAT CAA CGC AGA GTG CGA TCG CGG G-3’) was included in the 5’ 
adaptor. Titanium sequencing reagents were used for all samples. Additional 
expressed sequence tags for L. hyalina were sequenced with Sanger technology 
according to previously described methods”. 

Most Illumina samples were prepared with the NEBNext mRNA Sample Prep 

kit (New England BioLabs), with size selection for 400 base pair (bp) products. 
These samples were sequenced (paired-end, 104 bp), with one per lane on an 
Illumina GA IIx at the Genomics Core Facility at Brown University. One sample 
(marked Fragmentase in the Library Protocol column of Supplementary Table 1) 
was prepared with a modified NEBNext mRNA protocol, in which the full-length 
cDNA was fragmented with NEBNext dsDNA Fragmentase (New England 
BioLabs) instead of the mRNA being fragmented. This sample was sequenced 
(paired-end, 150 bp) in a single lane on an Illumina GA IIx at the FAS Center 
for Systems Biology at Harvard University. 
Assembly. Publicly available data from the NCBI dbEST database were processed 
with a version of the PartiGene pipeline (version 3.0.5)** that had been modified to 
run without user intervention. Trace Archive data were processed as described 
previously’®. 

Roche 454 data were assembled with the Newbler GS De novo Assembler (version 
2.3, Roche) with the flags ‘-cdna -nrm -nosplit’. In cases in which multiple splice 
variants (isotigs in Newbler terminology) were produced for a gene (an isogroup in 
Newbler terminology), a single exemplar splice variant was selected. The selected 
isotig was the one with the highest geometric mean of reads spanning each splice site 
between contigs. This roughly corresponds to the most abundant splice variant for 
the gene. Singletons that were not assembled by Newbler were assembled with 
CAP3 (version 10/15/07, with the options ‘-z 1’ and ‘-y 100°). The sequences that 
were assembled by Newbler, the sequences that were assembled by CAP3 and all 
singletons that were not assembled by either were used in subsequent analyses. 

Illumina data were assembled with Velvet (version 1.0.12) and Oases (version 
0.1.15). Insert lengths for Oases were estimated with a 2100 Bioanalyzer (Agilent). 
Reads that did not have an average quality score of at least 35 were removed. We 
examined the assemblies over a range of k values (21-61, in increments of 10). We 
selected a k value of 61 for all samples, except for Octopus (for which we used 31). 
As for the 454 assemblies, we selected a single splice variant (transcript in Oases 
terminology) for each gene (locus in Oases terminology). To accomplish this, we 
developed a procedure whereby we chose transcripts that were at least 150 nucleo- 
tides, had a length of at least 85% of the longest transcript for the gene and had the 
highest read coverage. We ignored loci that had more than 50 transcripts, as these 
often appeared to be the result of misassembly. 

Assembled data were compared to NCBI’s nr protein database with BLASTX, 
with an e cutoff of 0.00001. Large data sets were compared to a reduced nr database 
by masking nr sequences from taxa that do not belong to the clade designated 
by NCBI Taxon ID 33154 (Fungi/Metazoa group). Nucleotide sequences were 
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translated with a version of the prot4EST (version 2.3)*° pipeline that had been 
modified to run without user intervention, using these BLASTX results. 
Orthology assignment. The orthology assessment for data set assemblies fol- 
lowed one described previously’*. All-by-all comparisons were conducted with 
BLASTP as in ref. 17. Clustering analyses were conducted on these results by using 
MCL”. At the suggestion of recent analyses”, we excluded edges with —log;o 
BLASTP e values lower than 20, to reduce spurious cluster connections. We 
examined cluster composition with inflation parameters between 1.1 and 6 and 
found that the final cluster composition was not particularly sensitive to different 
inflation values in this range. We selected an inflation value of 2.1. Clusters with at 
least four taxa and at least one ingroup taxon were aligned by using MAFFT** and 
trimmed with Gblocks*’, and maximum likelihood analyses were conducted with 
RAxML”. The assessment of these phylogenies was conducted as in ref. 16. 
Monophyly masking was conducted to reduce the number of monophyletic 
sequences from the same taxon to one sequence. The resultant phylogenies were 
then analysed by an iterative paralogy pruning procedure, by which maximally 
inclusive subtrees with no more than one sequence per taxon were pruned and 
retained. FASTA-formatted files were generated from subtrees that were produced 
by the paralogy pruning procedure. These files were then aligned with MAFFT, 
trimmed with Gblocks, filtered (alignments with fewer than 150 sites were 
excluded) and concatenated into the final matrices. 

Phylogenetic analyses. We constructed two phylogenetic matrices from the trans- 
lated sequences. The ‘small’ matrix consists of 301 genes that are present in at least 
20 taxa. It has 50% gene occupancy and is 50,930 sites long. The ‘big matrix’ consists 
of 1,185 genes that are present in at least 15 taxa. It has 40% occupancy and 216,402 
sites. Both matrices contain data for all of the 46 species included in the study 

Maximum likelihood analyses were performed for both matrices by using 
RAxML (version 7.2.6)*° with both the Le and Gascuel (LG)*! and WAG” models 
with each gene region partitioned. Likelihood analyses consisted of first conduct- 
ing a bootstrap analysis with 200 replicates, which was followed by a thorough 
maximum likelihood search. 

Bayesian analyses of the small matrix were conducted with MrBayes (version 
3.1.2)” and PhyloBayes (version 3.3b)***°. The big matrix was too large to analyse 
with these tools. With MrBayes, we conducted two searches each with two runs 
(four runs and 16 chains total). We allowed MrBayes to estimate the fixed rate 
model of evolution. Each chain was run for 1,000,000 generations, and conver- 
gence was determined with time-series plots and an estimated sample size of tree 
likelihoods of at least 100. Samples recorded before burn-in were removed, and 
post-burn-in samples of the runs were combined. We summarized the posterior 
probabilities of the clades with majority rule consensus trees. 

We conducted analyses of the reduced-outgroup small matrix with PhyloBayes 
(version 3.3b) using the CAT model of evolution*’. PhyloBayes misidentified the 
data type of our matrix as DNA, resulting in model misspecification and lack of 
convergence. We conducted the analyses presented here with a modified version 
that was forced to read all matrices as protein sequences. Five PhyloBayes runs 
under the fully parameterized CAT model each converged at around 1,500 cycles 
(at least 86,000 generations) based on time-series plots of the likelihood scores and 
number of partitions. The runs were allowed to run for 5,000 cycles for two runs 
and 2,500 cycles for three runs. The runs estimated 140 (+10) categories for the 
model. We removed pre-burn-in samples and constructed a majority rule con- 
sensus tree using all five runs (Supplementary Fig. 9) 
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Homeostatic response to hypoxia is regulated by the 
N-end rule pathway in plants 


Daniel J. Gibbs", Seung Cho Lee?*, Nurulhikma Md Isat, Silvia Gramuglia’, Takeshi Fukao?, George W. Bassel!, 
Cristina Sousa Correia’, Francoise Corbineau’, Frederica L. Theodoulou’, Julia Bailey-Serres* & Michael J. Holdsworth! 


Plants and animals are obligate aerobes, requiring oxygen for 
mitochondrial respiration and energy production. In plants, an 
unanticipated decline in oxygen availability (hypoxia), as caused 
by roots becoming waterlogged or foliage submergence, triggers 
changes in gene transcription and messenger RNA translation that 
promote anaerobic metabolism and thus sustain substrate-level 
ATP production’. In contrast to animals’, oxygen sensing has 
not been ascribed to a mechanism of gene regulation in response 
to oxygen deprivation in plants. Here we show that the N-end rule 
pathway of targeted proteolysis acts as a homeostatic sensor of 
severe low oxygen levels in Arabidopsis, through its regulation of 
key hypoxia-response transcription factors. We found that plants 
lacking components of the N-end rule pathway constitutively 
express core hypoxia-response genes and are more tolerant of 
hypoxic stress. We identify the hypoxia-associated ethylene res- 
ponse factor group VII transcription factors of Arabidopsis as 
substrates of this pathway. Regulation of these proteins by the 
N-end rule pathway occurs through a characteristic conserved 
motif at the amino terminus initiating with Met-Cys. Enhanced 
stability of one of these proteins, HRE2, under low oxygen con- 
ditions improves hypoxia survival and reveals a molecular mech- 
anism for oxygen sensing in plants via the evolutionarily conserved 
N-end rule pathway. SUB1A-1, a major determinant of submer- 
gence tolerance in rice’, was shown not to be a substrate for the 
N-end rule pathway despite containing the N-terminal motif, indi- 
cating that it is uncoupled from N-end rule pathway regulation, 
and that enhanced stability may relate to the superior tolerance of 
Sub] rice varieties to multiple abiotic stresses*. 

The N-end rule pathway of targeted proteolysis associates the fate of 
a protein substrate with the identity of its N terminus (the N degron)*°. 
The N-terminal residue is classified as stabilizing or destabilizing, 
depending on the fate of the protein. An N degron containing a de- 
stabilizing residue is created through specific proteolytic cleavage, but 
can also be generated via successive enzymatic or chemical modifica- 
tions to the N terminus; for example, arginylation by Arg-tRNA protein 
transferases (ATE)’° (Supplementary Fig. 1). N-end rule pathway sub- 
strates containing destabilizing residues are targeted for proteasomal 
degradation via specific E3 ligases (also known as N recognins), such 
as PROTEOLYSIS1 and PROTEOLYSIS6 (PRT1 and PRT6) in 
Arabidopsis, which accept substrates with hydrophobic and basic N 
termini, respectively*"°. Several substrates of the N-end rule pathway 
are important developmental regulators in mammals"’ but as yet no 
substrates have been identified in plants. Previously we showed a func- 
tion of this pathway in abscisic acid (ABA) signalling through PRT6 
and ATE”, and it has also been associated with leaf senescence and 
shoot and leaf development in Arabidopsis'*'*. To understand N-end- 
rule-pathway-regulated gene expression we analysed the transcriptome 
of imbibed seed and seedlings of N-end rule pathway mutants ate! ate2, 


which lack ATE activity'*, and prt6 (Fig. la and Supplementary 
Table 1). This analysis revealed that genes important for anaerobic 
metabolism and survival of hypoxia, such as ADH1, SUS4 and PDC1, 
were constitutively expressed at high levels in both mutants, in 
common with wild-type Col-0 plants under hypoxia (Supplementary 
Fig. 2). For example, 47 of the 135 differentially regulated mRNAs in the 
wild-type hypoxia-induced transcriptome were also upregulated in 
prt6 seedlings grown under non-stress conditions (Supplementary 
Table 1; signal log, ratio=1, false discovery rate=0.01). The 
mRNAs upregulated in prt6 and ate1 ate2 mutants included over half 
of the core 49 mRNAs upregulated by hypoxia across seedling cell 
types’ (Fig. 1b and Supplementary Fig. 2). Consistent with this obser- 
vation, $-glucuronidase (GUS) expression driven by the promoter of 
ADHI (pADHI::GUS; ref. 16) was upregulated in wild-type seedlings 
subjected to hypoxia and ectopically expressed in mature embryos, 
roots and lower hypocotyls of prt6 mutants (Fig. lc and Sup- 
plementary Fig. 3). Constitutive expression of hypoxia-induced genes 
by N-end rule pathway mutant seedlings suggested that they would be 
resistant to hypoxic conditions. Imbibed seeds of both prt6 and ate1 
ate2 mutants were able to germinate well under low oxygen (3%) com- 
pared to wild type (Fig. 1d), and mutant seedlings were more able to 
survive prolonged oxygen deprivation (Fig. le, f). The ate1 ate2 double 
mutant showed greater resistance to hypoxia than prt6, indicating the 
existence of other as-yet-unidentified Arg-related E3 ligases, as previ- 
ously postulated'*™. 

Transcription factors of the five-member Arabidopsis ethylene 
response factor (ERF) group VII’? have recently been shown to 
enhance plant responses to hypoxia or anoxia, including HYPOXIA 
RESPONSIVE] and 2 (HRE1 and HRE2)* and RELATED TO AP2 2 
(RAP2.2)!°. Overexpression of RAP2.12 was also shown to induce 
expression of a pADH1::LUCIFERASE reporter gene”®. This sub- 
family shows homology to the agronomically important rice ERFs 
SUBMERGENCE 1A, B and C (ref. 3) and SNORKEL 1 and 2 
(ref. 21). SUB1A-1 within the SUBMERGENCE 1 (SUB1) locus (which 
also contains SUBIB and SUBIC) was shown to be a primary deter- 
minant of enhanced survival of rice plants under complete submer- 
gence’. With the exception of SUBIC, all contain the initiating motif 
Met-Cys (MC) at the N terminus, embedded within a longer consensus 
shared with most other group VII ERFs of Arabidopsis and rice, 
MCGGAII (Supplementary Fig. 4a). 

Removal of N-terminal methionine by METHIONINE AMINO- 
PEPTIDASE (MAP) reveals the tertiary destabilizing residue cysteine 
in proteins initiating with MC, which targets substrates for degrada- 
tion by the N-end rule pathway”””’ (Supplementary Fig. 1). In mouse, 
N-end-rule-pathway-mediated degradation of the MC-motif-containing 
G-protein signalling components RGS4 and RGSS is perturbed under 
hypoxia”. It was hypothesized that oxidation of cysteine at position 2 
(C2) in these proteins under normoxia creates a secondary destabilizing 
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Figure 1 | N-end rule mutants ectopically accumulate anaerobic response 
mRNAs and are more tolerant to hypoxia. a, Expression data for differentially 
expressed genes comparing wild-type (Col-0) and mutants under air or 
hypoxia (2h —O2).b, mRNAs upregulated in mutants overlap with 49 mRNAs 
induced across cell types by hypoxia in wild-type seedlings'®. c, Spatial 


residue allowing addition of arginine (R) to the N terminus by ATE, 
creating a primary destabilizing residue*’. We investigated the possibility 
that all Arabidopsis group VII ERFs as well as rice SUB1A-1 are N-end 
rule pathway substrates. A heterologous rabbit reticulocyte lysate 
assay~’ was used to express haemagglutinin (HA)-tagged ERFs driven 
by a T7 promoter in vitro, because components of the N-end rule 
pathway (ATE, MAP and PRT6) are highly conserved in eukaryotes’, 
and it has been shown that wheat-germ lysate does not contain an 
active proteosomal system”. Arabidopsis group VII ERFs were 
short-lived, and their stability was enhanced by MG132 and the 
N-end rule pathway competitive dipeptide Arg-B-Ala, but not by the 
non-competitive Ala-Ala dipeptide* (Fig. 2a). Mutation of C2 to 
alanine (C2A), which should remove the N-degron and stabilize 
proteins specifically with respect to the N-end rule pathway”, signifi- 
cantly enhanced stability in vitro of Arabidopsis ERFs, indicating that 
all group VII ERFs are potential substrates of the N-end rule pathway. 
Arabidopsis contains 206 proteins from gene models with MC at the N 
terminus; we used two of these—VERNALISATION 2 (VRN2) and 
MADS AFFECTING FLOWERING 5 (MAF5), which lack the 
extended N-terminal group VII ERF consensus (Supplementary Fig. 
4b)—to test the specificity of this sequence. Whereas HA-tagged VRN2 
(VRN2-HA) was degraded in this system, and stabilized by the intro- 
duction of a C2A mutation (VRN2(C2A)-HA), MAF5-HA and 
MAF5(C2A)-HA were both stable (Fig. 2b), indicating that not all 
Arabidopsis MC proteins are N-end rule pathway substrates. This is 
not surprising as it has previously been shown that optimal positioning 
of a downstream lysine for ubiquitination is also a key determinant of 
the quality of an N degron*””*. SUB1A-1 was resistant to degradation 
(Fig. 2c). As the N-terminal sequence of SUB1A-1 differs at position 5 
(Erather than A, Supplementary Fig. 4a), we analysed a mutant version 
that replaced this amino acid to reconstitute the consensus group VII 
sequence (SUB1A-1(E5A)-HA). SUB1A-1(E5A)-HA was also stable 
in vitro (Fig. 2c), indicating that degradation of this protein is 
uncoupled from the N-end rule pathway. As expected, the rice protein 
SUB1C-1-HA, lacking an MC N terminus, was long lived in vitro 
(Fig. 2c). 

To confirm the activity of the N-end rule pathway towards specific 
MC-containing substrates in plants, we analysed the in vivo longevity 
of the ERF proteins HRE1 and HRE2 (Fig. 2d). We expressed either 
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wild-type or mutant (HRE1(C2A), HRE2(C2A)) HA-tagged versions 
of these proteins ectopically using the CaMV35S promoter in 
Arabidopsis. In wild-type plants, only the mutant C2A proteins could 
be detected at high levels, despite detectable expression of correspond- 
ing mRNAs, indicating that wild-type versions are N-end rule pathway 
substrates in vivo. HRE2-HA expressed in the prté mutant was stable, 
linking its degradation directly to PRT6. To assess whether oxygen 
regulates the stability of HRE proteins, we analysed the accumulation 
of HRE-HA proteins in wild-type plants expressing HRE1-HA, 
HRE1(C2A)-HA, HRE2-HA and HRE2(C2A)-HA under normal 
and low oxygen conditions (Fig. 3a). After transfer of seedlings to 
hypoxic conditions we observed elevation of HRE2-HA within 2h, 
but could not detect HRE1-HA (Fig. 3a and Supplementary Fig. 5a, b). 
HRE2-HA became destabilized again upon return to normoxic con- 
ditions (Fig. 3a). Both seeds and seedlings ectopically expressing stable 
C2A versions of HREI and HRE2 had increased tolerance to extended 
periods of oxygen deprivation (Fig. 3b—d and Supplementary Fig. 5c). 

These data demonstrate that Arabidopsis ERF group VII transcrip- 
tion factors are substrates of the N-end rule pathway, and function to 
sense molecular oxygen, most likely through oxidation of the tertiary 
destabilizing residue cysteine. Stabilization of these proteins under 
hypoxic conditions leads to increased survival under low oxygen stress 
(Fig. 3e). It is currently unclear whether oxidation occurs through a 
chemical or enzymatic mechanism, although cysteine is readily 
oxidized chemically”*. It is also unclear whether oxidation is related 
directly to molecular oxygen, or if indirect cellular changes associated 
with oxygen availability (such as alterations in cytosolic pH”” and 
specific metabolites or transient accumulation of reactive oxygen 
species') might trigger cysteine oxidation. SUB1A-1 may provide 
enhanced responsiveness to submergence and drought in rice in part 
due to the fact that it is not a substrate of the N-end rule pathway. By 
contrast, the condition-dependent destabilization of group VII ERFs 
in Arabidopsis could require oxygen levels to decline below some 
threshold before these factors can activate anaerobic gene transcrip- 
tion. It is probable that SUB1A-1 evades the N-end rule pathway due to 
the absence of an optimally positioned lysine downstream of the N 
degron, as substrate quality is determined combinatorially by an 
N degron destabilizing residue and downstream lysine position*°”*. 
Alternatively, differences in protein tertiary structure may preclude 
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Figure 2 | Group VII ERF transcription factors are substrates for the N-end 
rule pathway in vitro and in vivo. a, Western blot analysis of in vitro stability 
of HA-tagged wild-type and C2A variants of Arabidopsis group VII ERFs in the 
absence or presence of MG132, N-end rule pathway competitive dipeptide 
(Arg-B-Ala) or non-competitive dipeptide (Ala-Ala). b, In vitro stability of 
wild-type and C2A VRN2-HA and MAF5-HA. ¢, In vitro stability of HA- 
tagged rice ERFs. d, In vivo protein stability and RNA expression levels of wild- 
type and C2A variants of HREI-HA and HRE2-HA ectopically expressed in 
Arabidopsis, shown for two independent transformed lines (1 and 2). 


N-terminus accessibility. SUB1A-1 was also recently shown to mediate 
crosstalk between submergence and drought tolerance in rice by 
augmenting ABA responsiveness’, suggesting a link between drought 
tolerance and the previously identified function of the N-end rule 
pathway in removing responsiveness to ABA". Targeted degradation 
of proteins by the N-end rule pathway was identified as a homeostatic 
mechanism in mammalian systems”*”*’, for example in the control of 
hypoxia-related expression of RGS4 (ref. 28) and RGSS (ref. 23). It is 
fascinating that the N-end rule pathway carries out the same functionality 
in relation to low oxygen stress in plants, but taking as substrates 
members ofa plant-specific transcription factor family. This highlights 
evolutionary conservation of the mechanism of oxygen perception 
across kingdoms using the N-end rule pathway independent of the 
targets. Our confirmation of in vivo function of two members of ERF 
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Figure 3 | HRE proteins are stabilized under low oxygen levels and confer 
hypoxia tolerance. a, In vivo stability of wild-type and C2A HRE1-HA, 
HRE2-HA (anti-HA) or S6 (ribosomal protein S6) control (anti-S6). HS, 2h 
hypoxia; NS, no stress; R, following 1h recovery from stress. b, Seedlings 
expressing wild-type or C2A HRE1-HA and HRE2-HA after 12h hypoxic 
stress and 3 d of recovery. Scale bar: 0.6 cm. c, Seedling survival for wild-type or 
C2A HRE1-HA and HRE2-HA after 9 h or 12h hypoxic stress. Data are mean 
of replicate experiments + s.d. *P < 0.05; ** P< 0.01. d, Germination under 
reduced oxygen availability. e, Model explaining N-end-rule-pathway- 
mediated oxygen-dependent turnover of group VII ERFs in Arabidopsis. 


group VII provides direct evidence for the control of HRE2 by oxygen 
and the N-end rule pathway and indirect evidence that HRE] is also an 
N-end rule pathway substrate in vivo. We demonstrate that all mem- 
bers of Arabidopsis group VII ERFs are N-end rule pathway substrates 
in vitro, and thus it is possible that all members orchestrate N-end-rule- 
pathway-controlled, hypoxia-related functions. Identification and 
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manipulation of N-end rule pathway substrates will therefore be a key 
target for both conventional breeding and biotechnological approaches 
in relation to manipulation of plant responses to abiotic stress. 


METHODS SUMMARY 


Protein stability analyses. Full-length cDNAs were amplified by polymerase 
chain reaction (PCR) from either Arabidopsis thaliana or Oryza sativa L. 
(cv. M202(Sub1)). N-terminal mutations were introduced using the forward 
primer (Supplementary Table 2). For in vitro assays, cDNAs were cloned into a 
modified version of the pTNT vector (Promega) to produce C-terminal HA 
fusions. Stability assays were performed using the TNT T7 Coupled 
Reticulocyte Lysate system (Promega), essentially as described previously~. For 
in vivo analysis of HRE-HA proteins, cDNAs were cloned into pE2c, mobilized 
into pB2GW7 and transformed into Arabidopsis using the floral dip method. To 
assess relative protein stability, equal amounts of total protein extracted from 
7-day-old T; homozygous seedlings were analysed by western blot, and cDNA 
synthesized from total RNA was used as a template for semi-quantitative PCR. 
Gene expression analyses. For microarray analysis, total RNA extracted from 
seeds’* or seedlings'* was hybridized against the Arabidopsis ATH1 genome array 
(Affymetrix). Differentially expressed genes were clustered as described previ- 
ously'’. pADH::GUS'* was crossed to prt6-1 and homozygous seeds or seedlings 
were analysed for GUS activity before and after submergence for the times indicated. 
Low O, phenotypic analyses. To assess germination (scored as radicle emer- 
gence), imbibed seeds were incubated for 7-days in chambers flushed with varying 
O, tensions”. For 7-day-old seedling survival, O, deprivation was achieved by 
bubbling 99.995% argon through water into chambers under positive pressure, 
before recovering in air for 3 days and scoring of plants (m = 15) per plate that were 
non-damaged, damaged or dead (scored 5, 3 and 1, respectively)’*. The same argon 
chambers were used to treat seedlings for the times indicated before protein 
extraction for western blot analysis. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Growth and analysis of plant material. Arabidopsis thaliana seeds were obtained 
from NASC, except for transgenics containing pADH::GUS (ref. 16) (a gift from 
R. Ferl). Columbia-0 (Col-0) was the wild type for all analyses. prt6-1, prt6-5 and 
ate1 ate2 mutants were described previously'*"*. For the generation of transgenic 
Arabidopsis and in vivo protein assays, plants were grown vertically on half MS 
media for 7 days at 22°C in 150 1molm~*s~' constant light and transferred to 
soil after 2 weeks if required. For analysis of seedling O2 deprivation survival and 
protein analysis, plants were grown vertically on MS medium (0.43% (w/v) MS 
salts, 1% (w/v) Suc and 0.4% (w/v) phytagel, pH 5.75) at 23 °C with a 16-h-day (50 
pmol ms!) and 8-h-night cycle for 7d. The rice (Oryza sativa L.) SUBI 
introgression line cv. M202(Sub1) was grown and submerged before cDNA isola- 
tion as described previously’. All plant experiments were carried out at least three 
times. 

Analysis of oxygen deprivation response in seeds and seedlings. Seven-day-old 
Arabidopsis seedlings were subjected for specified durations to non-stress (NS) or 
hypoxia stress (HS) treatments, or subjected to hypoxia stress and returned to 
ambient air (re-oxygenation; R). For seedling survival, 15 Col-0 and 15 mutant 
seedlings were grown side by side (3 replicates). Treatments commenced at the end 
of the 16-h light cycle in open (NS) or sealed (HS) chambers. For HS, 99.995% 
argon gas was bubbled through water and into the chamber while air was expelled 
by positive pressure”. After treatment, the 15 seedlings per genotype per plate 
were scored as non-damaged, damaged and dead (scored 5, 3 and 1, respectively) 
compared to wild-type plants grown on the same plate and results analysed using 
the students t-test, as described previously”’, or seedlings were frozen under liquid 
nitrogen within 3 min of release before protein extraction. 

Germination of Arabidopsis seeds (3-4 replicates of n = 60-100; scored on day 
7 as radicle emergence) was performed at 22 °C under constant light in various 
oxygen tensions achieved through mixing N> and air via capillary tubes according 
to the apparatus described previously”. 

Wild-type plants carrying the pADH::GUS transgene’® were crossed to prt6-1 
plants and homozygous prt6-1 pADH::GUS individuals were identified in the F, 
population. Seven-day-old seedlings were submerged in degassed water in the dark 
to induce hypoxia for the times indicated. Embryos were dissected 6 h after being 
imbibed. Seedlings and embryos were assayed for GUS activity and imaged fol- 
lowing standard methods”. 

Construction of transgenic plants and protein and RNA extractions. To gen- 
erate C-terminally HA-tagged ERF fusions of HREI (Atlg72360) and HRE2 
(At2g47520) driven by the 35SCaMV promoter, full-length cDNAs amplified 
from Arabidopsis total seedling cDNA were first ligated into the Entry vector 
pE2c and then mobilized into the Destination binary vector pB2GW7, as described 
previously*’. N-terminal mutations were incorporated by changing the forward 
primer sequences accordingly (Supplementary Table 2). Transformation into 
Agrobacterium tumefaciens (strain GV3101 pMP90) and Arabidopsis thaliana 
was performed according to established protocols™. Proteins were extracted from 
7-day-old homozygous T; seedlings as described**. Extracts were quantified using 
the Bio-Rad DC assay and subjected to anti-HA immunoblot analysis. For semi- 
quantitative RT-PCR, RNA was extracted using an RNEasy plant mini kit 
(Qiagen) and converted to cDNA using Superscript III Reverse transcriptase 
(Invitrogen). PCRs were performed with transgene-specific primers (gene-specific 
forward, HA-tag reverse) and ACTIN-2 was amplified for use as a loading control 
(Supplementary Table 2). 

In vitro analysis of protein stability. To generate Arabidopsis and rice protein- 
HA fusions driven by the T7 promoter, cDNAs were PCR amplified from 
Arabidopsis total CDNA or submerged rice CDNA (M202(Sub1)), as described’, 
and ligated into a modified version of the pTNT (Invitrogen) expression vector 
(pTNT3xHA). N-terminal mutations were incorporated by changing the forward 
primer sequences accordingly (Supplementary Table 2). 
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Proteins were expressed in vitro using the TNT T7 Coupled Reticulocyte Lysate 
system (Promega) according to manufacturer’s guidelines, using 500 ng plasmid 
template. Where appropriate, 100 4M MG132 or 1 mM dipeptides (Arg-B-Ala or 
Ala-Ala; Sigma-Aldrich) and 150nm Bestatin (Sigma-Aldrich) were added. 
Reactions were incubated at 30°C, and samples were taken at indicated time 
points before mixing with protein loading dye to terminate protein synthesis. 
Equal amounts of each reaction were subjected to anti-HA immunoblot analysis. 
All blots were checked for equal loading by Ponceau staining. 

Immunoblotting. Proteins resolved by SDS-PAGE were transferred to PVDF 
using a MiniTrans-Blot electrophoretic transfer cell (Bio-Rad). Membranes were 
probed with primary antibodies at the following titres: anti-HA (Sigma-Aldrich), 
1:1,000; anti-a-tubulin (Sigma-Aldrich), 1:5,000; anti-ribosomal protein S6 (ref. 
36), (1:5,000). HRP-conjugated anti-mouse secondary antibody (Santa Cruz) was 
used at a titre of 1:10,000. Immunoblots were developed to film using ECL western 
blotting substrate (Pierce). 

Alignment of MC-ERF proteins from Arabidopsis and rice. Rice and 
Arabidopsis ERF proteins starting with the sequence MC were aligned and phylo- 
genetic relationships observed using CLUSTALW”. 

Microarray hybridization and data analyses. Total RNA extracted from seeds or 
seedlings was assessed for quality using the Agilent 2100 Bioanalyser with the RNA 
6000 Nano reagent kit. Biotin-labelled cRNA was synthesized using the Affymetrix 
3' IVT Express Labelling kit and hybridized against the Arabidopsis ATH1 genome 
array (GeneChip System, Affymetrix). CEL file data were processed to estimate the 
abundance of each expressed mRNA in two (seedling) or three (imbibed seed) 
biological replicate samples as described previously’. The microarray experiments 
reported here are described following MIAME guidelines and are deposited in 
GEO under the accession number GSE29941. 

The differentially expressed genes were further analysed by use of fuzzy k-means 
clustering with the FANNY function from the Cluster package in R, as described’. 
The resulting gene-to-cluster assignments are given in Supplementary Table 1 and 
were visualized with the TIGR MEV program. Each gene cluster was evaluated for 
enrichment of specific gene functions (Gene Ontology (GO)) as described previ- 
ously** using Arabidopsis gene-to-GO mappings from TAIR (http://geneontology. 
org; downloaded 17 May 2011). 
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Oxygen sensing in plants is mediated by an N-end 
rule pathway for protein destabilization 


Francesco Licausi>*, Monika Kosmacz!, Daan A. Weits', Beatrice Giuntoli*, Federico M. Giorgi’, Laurentius A. C. J. Voesenek?*, 


Pierdomenico Perata” & Joost T. van Dongen! 


The majority of eukaryotic organisms rely on molecular oxygen 
for respiratory energy production’. When the supply of oxygen is 
compromised, a variety of acclimation responses are activated to 
reduce the detrimental effects of energy depletion” *. Various 
oxygen-sensing mechanisms have been described that are thought 
to trigger these responses” °, but they each seem to be kingdom 
specific and no sensing mechanism has been identified in plants 
until now. Here we show that one branch of the ubiquitin-dependent 
N-end rule pathway for protein degradation, which is active in 
both mammals and plants'®"', functions as an oxygen-sensing 
mechanism in Arabidopsis thaliana. We identified a conserved 
amino-terminal amino acid sequence of the ethylene response 
factor (ERF)-transcription factor RAP2.12 to be dedicated to an 
oxygen-dependent sequence of post-translational modifications, 
which ultimately lead to degradation of RAP2.12 under aerobic 
conditions. When the oxygen concentration is low—as during 
flooding—RAP2.12 is released from the plasma membrane and 
accumulates in the nucleus to activate gene expression for hypoxia 
acclimation. Our discovery of an oxygen-sensing mechanism opens 
up new possibilities for improving flooding tolerance in crops. 
Tolerance to submergence and low oxygen availability (hypoxia) 
have been considered to be influenced by different members of sub- 
group VII of the ERF transcription factor family in Arabidopsis 
(RAP2.12 (ref. 12), RAP2.2 (ref. 13); HRE1 and HRE2 (ref. 14)) and 
rice (SUBI (ref. 15), SK1 and SK2 (ref. 16)). Here, we reveal the mech- 
anism by which molecular oxygen acts upon RAP2.12 (At1g53910) to 
trigger molecular acclimation responses. RAP2.12 is highly homolog- 
ous to RAP2.2 and is widely conserved in higher plants (Supplemen- 
tary Fig. 1). It is constitutively expressed throughout the entire plant 
(Supplementary Fig. 2) and further upregulated in leaves upon hypoxia, 
but not by the ethylene precursor 1-aminocyclopropane-1-carboxylic- 
acid (ACC) (Supplementary Fig. 3). RAP2.12 positively regulates gene 
transcription in planta via a conserved carboxy-terminal motif (Sup- 
plementary Fig. 4). Constitutive overexpression of RAP2.12 (35S::RAP2.12) 
did not significantly affect the phenotype of Arabidopsis plants when 
grown aerobically (Fig. la, b). However, submergence tolerance of 
independently transformed 35S::RAP2.12 plants increased with 
respect to the wild-type control, as demonstrated by the increased 
number and dry weight of plants that recovered from submergence 
(Fig. la, c, d), which can be explained by the faster and stronger induc- 
tion of hypoxia-responsive genes during the flooding treatment in 
35S::RAP2.12 plants (Supplementary Fig. 5). Interestingly, different 
flooding-tolerance strategies in two wild Rumex species correlated 
with the differential induction of ERF1, which is the orthologue of 
RAP2.12 (Supplementary Fig. 6). In contrast, constitutive expression 
of RAP2.12 with a haemagglutinin (HA)-peptide tag at its N terminus 
(35S::HA::RAP2.12) resulted in a reduction of plant growth in air 
(Fig. la, b). Concomitantly, tolerance to submergence decreased as 
compared to the wild type (Fig. 1c). Similar results were observed when 


a version of RAP2.12 was expressed from which the first 13 amino 
acid residues were deleted (35S::413RAP2.12). It thus seemed that 
manipulating the N-terminal amino acid sequence obstructed the 
regulative function of RAP2.12 already under aerobic conditions, 
thereby reducing the vigour and stress tolerance of the plants. 

To understand the impact of the N-terminal modifications on the 
activity of RAP2.12, we investigated which genes are expressed under 
the control of RAP2.12. We found that under aerobic conditions 
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Figure 1 | The transcription factor RAP2.12 regulates hypoxia tolerance of 


plants. a, The effect of overexpression of RAP2.12, HA::RAP2.12 or 
A13RAP2.12 on plant growth in air, or after submergence. Scale bar, 2 cm. 

b, Dry weight of 7-week-old rosette leaves from air-grown plants (n = 20). 

c, Percentage of plants surviving flooding-induced hypoxia (m = 4). d, Dry 
weight of rosette leaves from surviving plants, 2 weeks after the flooding 
treatment (n = 20). e, Differential expression of hypoxia-responsive genes 
(reference: wild type in air). Numeric expression values are shown in 
Supplementary Table 1. Data are presented as mean + s.d. *P < 0.05, one-way 
ANOVA. 
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35S8::RAP2.12 plants exhibited a slight increase in the expression of 
hypoxia marker genes, whereas during flooding the expression of these 
hypoxia marker genes was more strongly upregulated in plants over- 
expressing RAP2.12 as compared to wild-type plants during flooding 
(Fig. le and Supplementary Table 1). During re-oxygenation, the 
expression of the hypoxia marker genes was rapidly downregulated in 
both wild-type plants and 35S::RAP2. 12, whereas in 35S::HA::RAP2.12 
and 35S::413RAP2. 12 the level of expression remained high, as before 
the flooding treatment. The correlation between this expression pattern 
of hypoxia response genes and the reduced growth and recovery after 
flooding that is observed for plants overexpressing HA::RAP2.12 and 
A13RAP2. 12 indicates that proper upregulation of the hypoxia response 
genes during flooding as well as downregulation of these genes during 
recovery from flooding are both required for optimal plant acclimation. 
The activation of hypoxic gene expression by RAP2.12 was further 
confirmed by the observation that RAP2.12 induced a luciferase 
(Luc) reporter gene when its promoter contained the motif ATCTA 
(Supplementary Fig. 7), which was previously identified as a hypoxia- 
responsive element in plants’’”. On the other hand, the relatively small 
effect of 35S::RAP2.12 on gene expression under aerobic conditions 
(Fig. le) indicated that an additional regulatory mechanism reliant 
on sensing of low oxygen concentrations is needed to induce hypoxic 
gene expression. Interestingly, this requirement was abolished when the 
N terminus of RAP2.12 was modified either by fusing the HA-peptide 
tag to the protein (35S::HA::RAP2.12 in Fig. le), or by deleting its first 
conserved amino acid residues (35S::413RAP2.12 in Fig. le and 
Supplementary Fig. 8), indicating that the N terminus of RAP2.12 


a 358::RAP2. 12::GFP 


35S::A13RAP2. 12::GFP b 


has an important role in the regulation of the oxygen-dependent activa- 
tion of the transcription factor. 

Further comparative analysis of a full-genome expression profile of 
the hypoxic response in wild-type plants and the differential regulation 
of genes by expressing the HA::RAP2. 12 construct under aerobic condi- 
tions revealed that the genes that were most strongly up- or downregu- 
lated by HA::RAP2.12 were also differentially expressed under hypoxia 
(Supplementary Fig. 9 and Supplementary Table 2). Similarly, the 
silencing of RAP2.12 and its closest homologue RAP2.2 using an 
artificial microRNA approach reduced the induction of hypoxic gene 
expression by low oxygen (Supplementary Fig. 10 and Supplemen- 
tary Tables 3 and 4). Given that the messenger RNA stability of 
RAP2.12 was not affected by the additional nucleotides encoding the 
N-terminal peptide tag (Supplementary Fig. 11), we concluded that 
post-translational modifications of the N-terminal amino acid residues 
of RAP2.12 are involved in regulating the activity of this transcription 
factor, which is required to induce hypoxia core-response genes. 

We further investigated the role of the N-terminal amino acid residues 
by determining the subcellular localization of RAP2.12 fused to green 
fluorescent protein (GFP). Under aerobic conditions, the fusion protein 
localized to the plasma membrane; however, upon hypoxia it accumu- 
lated in the nucleus (Fig. 2a and Supplementary Fig. 12). Remarkably, 
upon re-oxygenation the RAP2.12::GFP signal fully disappeared within 
1h (Fig. 2a). After deleting the conserved N-terminal amino acid 
residues of RAP2.12 (35S::413RAP2.12::GFP), the transcription 
factor was observed in both the cell membrane and the nucleus 
under aerobic conditions (Fig. 2a). However, under hypoxia, the 
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Figure 2 | RAP2.12 is membrane localized and re-localizes in the nucleus 
upon hypoxia. a, Subcellular localization of stably transformed GFP-fused 
RAP2.12 and A13RAP2.12. Localization controls are shown in Supplementary 
Fig. 12. b, Yeast two-hybrid analysis showing interaction between RAP2.12 and 
ACBP1 and ACBP2. ¢, Bimolecular fluorescence complementation of YFP 
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membrane association disappeared, the protein accumulated 
only in the nucleus and remained there even after re-oxygenation 
(35S::413RAP2.12::GFP in Fig. 2a). Thus, manipulation of the con- 
served N terminus of RAP2.12 seems to affect the oxygen-dependent 
subcellular localization of the transcription factor and, moreover, 
stabilizes the protein under aerobic conditions. 

As RAP2.12 has no hydrophobic domains that could explain its 
localization at the plasma membrane, we searched for interaction 
partners of the transcription factor. Yeast two-hybrid analyses 
(Fig. 2b) and bimolecular fluorescence complementation (BiFC) 
analysis (Fig. 2c) revealed an interaction between RAP2.12 and the 
membrane-localized acyl-CoA-binding proteins ACBP1 and ACBP2 
(ref. 18), as had been shown previously for ACBP2 and RAP2.3 
(ref. 19). The interaction between RAP2.12 and ACBP depended on 
an amino acid sequence between position 123 and 177, which covers 
the RAYD motif, a sequence already known to mediate protein- 
protein interactions” (Fig. 2d). 

The essential role of the N-terminal residues of RAP2.12 is further 
supported by the conservation of the first amino acids in almost all 
members of ERF subfamily VII (Supplementary Fig. 8). The specific 
sequence of their conserved N terminus qualifies ERF-VII proteins as 
candidate substrates of the N-end rule pathway” (Fig. 3a). 


a b 


35S::RAP2. 12::GFP 
in atelate2 


35S::RAP2. 12::GFP 
in wild type 


LETTER 


According to this pathway the terminal Met is removed from the 
protein by methionine aminopeptidase (MetAP) when the second 
amino acid of the protein is Cys’? (Supplementary Fig. 13 and 
Supplementary Table 5). Terminal Cys is oxidized to cysteine sulphenic 
acid in an oxygen-dependent manner before arginine transferase (ATE) 
conjugates an Arg residue to the protein’”’. This triggers subsequent 
ubiquitination by the ligase PROTEOLYSIS 6 (PRT6)” and targets the 
protein to the proteasome for degradation”, which can occur in both 
the cytosol and the nucleus”. Transient expression of RAP2.12::GFP in 
atelate2 or prt6 knockout plants resulted in accumulation of the tran- 
scription factor in the nucleus both during aerobic and hypoxic con- 
ditions as well as after re-oxygenation (Fig. 3b), similar to what we 
observed by deleting the N-terminal sequence (Fig. 2a) or after incuba- 
tion with the proteasome inhibitor MG132 (Supplementary Fig. 14). 
Western blot analyses showed that the amount of RAP2.12 increased 
under hypoxia and decreased again after re-oxygenation in the wild 
type but not in atelate2 or prt6 (Fig. 3c). The tolerance to submer- 
gence of atelate2 and prt6 rosette plants was reduced (Supplementary 
Fig. 15), in line with the negative impact of 35S::HA::RAP2.12 and 
35S::A13RAP2.12 on survival (Fig. la, b). Lastly, exchanging the 
N-terminal Cys with Ala (35S::MAG-RAP2.12::GFP) resulted in a 
GFP signal in the nucleus, similar to what we observed in any of the 
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Figure 4 | Model describing the oxygen sensor mechanism in plants. The 
transcription factor RAP2.12 is constitutively expressed under aerobic 
conditions. RAP2.12 protein is always present, bound to ACBP to prevent 
RAP2.12 from moving into the nucleus under aerobic conditions and to protect 
it against proteasomal degradation in air. Upon hypoxia, RAP2.12 moves into 
the nucleus, where it activates anaerobic-gene expression. Upon re- 
oxygenation, RAP2.12 is rapidly degraded via the N-end rule pathway and 
proteasome-mediated proteolysis to downregulate the hypoxic response. 


other approaches to modify the N-end rule pathway (Fig. 3d). All this 
indicates that the lifetime of RAP2.12 is controlled by the N-end rule 
pathway for proteasomal protein degradation. 

Next, we investigated whether an oxygen-dependent N-end rule 
pathway is active in plants and whether it regulates the oxygen- 
dependent activation of hypoxic gene expression. Fusion of the first 
conserved N-terminal amino acid residues from RAP2.12 to the Luc 
reporter protein resulted in an increase of the normalized Luc activity 
under hypoxic conditions and reduced Luc activity upon re-oxygenation, 
as predicted by the Cys-oxidation-dependent branch of the N-end rule 
pathway (Fig. 3e). In addition, constitutive upregulation of hypoxia 
marker genes was observed under aerobic conditions in plants with 
reduced ATE and PRT activities (Fig. 3f). This indicates that the 
oxygen-dependent oxidation of the terminal Cys of RAP2.12 prevents 
hypoxic gene expression via the destabilization of RAP2.12 in air. Only 
when the oxygen concentration decreases is Cys oxidation prevented, 
and the now stably accumulating RAP2.12 can induce the expression of 
genes involved in the hypoxic response (Fig. 4). Here, we have shown 
that this oxygen-dependent Cys oxidation is adopted by the ERF-VII 
factor RAP2.12 and—together with its oxygen-dependent re-local- 
ization—triggers the hypoxia-acclimation response in Arabidopsis. 


METHODS SUMMARY 


Unless specifically indicated in the text, low oxygen (hypoxia) conditions used in 
this study were always maintained at 1% (v/v) oxygen. Full details of the materials 
and experimental procedures are provided in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Plant materials. A. thaliana Columbia-0 (Col-0) was used as wild-type ecotype, as 
described in the figure legends. Double ate1ate2 knockout seeds were provided by 
E. Graciet, prt6 knockout seeds (line EOL4) were obtained from the Institute of 
Agronomic Research, 35S::GFP seeds were provided by M. Kawai- Yamada. 
Growth conditions and phenotypic evaluation. Seeds were sown in moist soil, 
stratified at 4 °C in the dark for 48 h and germinated at 22 °C day/18 °C night with 
a photoperiod of 8 h light and 16 h darkness. For all experiments 5-week-old plants 
were used. Low oxygen (1% (v/v) oxygen in air) treatments were performed as 
described previously’’. Flooding tolerance was assayed using three independent 
transgenic lines. Plants were submerged with deionized water in 15-cm-high 
plastic boxes and kept in the dark. Leaves were at 5 cm under the water surface. 
After 84h, the water was removed from the boxes and photoperiodic conditions 
(8h/16h, light/dark) were restored. Tolerance assays were repeated four times 
by using 10-20 plants per genotype each time. Rumex spp. cultivation and 
submergence treatment were performed as described previously”’. 

Cloning of the various constructs. Coding sequences (CDSs) were amplified from 
a cDNA template using Phusion High Fidelity DNA-polymerase (New England 
Biolabs). An artificial microRNA (amiRNA) against RAP2.12 was generated by 
overlapping PCR using the pRS300 vector as backbone. All open reading frames 
were cloned into pENTR/D-TOPO (Invitrogen). The resulting entry vectors were 
recombined into destination vectors using the LR reaction mix II (Invitrogen) to 
obtain the expression vectors. A complete list of all destination vectors and primers 
used is provided in Supplementary Tables 6 and 7, respectively. 

Plant transformation. Stable transgenic plants were obtained using the floral dip 
method”*. TO seeds were screened for kanamycin or phosphinotricine resistance 
and single-insertion lines were identified as described previously’* Transient leaf 
transformations using 3-week-old plants were performed as described previ- 
ously”. All transient expression essays were repeated at least three times using 
independently grown plants. Each time the experiment was repeated, we trans- 
formed leaves from three independent plants. So, at least nine independent trans- 
formations from at least three different plant cultures were analysed. 
qRT-PCR. RNA extraction, removal of genomic DNA, cDNA synthesis and 
qRT-PCR analyses were performed as described previously’. For 35S::RAP2. 12, 
358::HA::RAP2.12, amiRAP2.2-12 and 35S::413RAP2.12 three independent trans- 
genic lines were used and the average expression value was calculated. For all the 
other genotypes, three independent biological replicates were used. 
Microarrays. Three independent RAP2.12 overexpressors or RAP2.2-RAP2.12 
silenced lines were grown in soil for 5 weeks and then subjected to a treatment 
with 1% oxygen in the dark for 90 min. Total RNA from whole rosettes was 
extracted as described for the qRT-PCR analyses. Hybridization and scanning 
procedures were performed by NASC (http://arabidopsis.info/). Microarray ana- 
lysis and data quality control were performed as described previously’’ using 
Robin’. Normalization of the raw data and an estimation of signal intensities 
were carried out using the Genechip Robust Multiarray Average (GC-RMA) 
methodology’. Differential gene expression analysis was carried out using 
limma”’, with a Benjamini-Hochberg P-value correction*’. Microarray data sets 
were deposited in a public repository with open access (accession number 
GSE29187; http://www.ncbi.nlm.nih.gov/geo/). 

Confocal imaging. For GFP and YFP imaging, leaves from independent stable or 
transiently transformed 4-week-old plants were analysed with a Leica DM6000B/ 
SP5 confocal microscope (Leica Microsystems). 

Reporter transactivation assay. Arabidopsis mesophyll protoplasts were used to 
identify the region responsible for the trans-activation activity of RAP2.12. The 
DNA-binding domain from Saccharomyces cerevisiae was fused at the N terminus 
of RAP2.12 and its deletion variants. The UAS fused to a minimal 35S promoter 
was inserted into pGreenII-800LUC to generate a reporter and normalization 
vector. Non-recombined pDBD-GW vector was used as a negative control. 
Protoplasts were prepared according to a previously described method™ and 
transfected using 5 1g plasmid DNA each. A dual luciferase reporter assay was 
performed as described previously’®. 

Protein stability assay using the Luc reporter system. Leaves of A. thaliana 
Col-0 were transformed with either a 35S::PpLuc or a 35S::MCGGAII::PpLuc 
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constructs (both containing also a 35S::RrLuc cassette for normalization pur- 
poses). Normalized luciferase activity (PpLuc/RrLuc) was measured as described 
previously’ and Luc protein stability was evaluated as the ratio between 
MCGGAII::PpLuc and PpLuc transfected leaves. The experiment was repeated 
three times using five independent replicates in each repetition. 

Yeast two-hybrid assay. The ProQuestI'm Two-hybrid System (Invitrogen) was 
used. PExpTM32/Krevl and pEXPTM22/RalGDS-wt were used as positive con- 
trols, and pDESTTM32 and pDESTTM22 as negative controls. S. cerevisiae strain 
Mav203 was transformed with the different combinations of bait, prey and control 
vectors (Supplementary Fig. 16). Colonies containing both vectors were selected by 
plating at 28 °C to select colonies containing an interacting protein partner for 3 
days on minimal selective dropout medium lacking Leu and Trp (SC-LW medium). 
They were subsequently replicated on selective dropout medium (SC-LWH+3AT 
medium) lacking Leu, Trp, His and supplemented with 10 mM 3-aminotriazole 
(3AT). The strength of the interaction was further verified by B-galactosidase 
staining (LacZ) following the manufacturer’s instructions. 

SC-LW, control medium without Leu and Trp; SC-LWH-+ 3AT, selective medium 
without Leu, Trp, His and with 3AT. 

BiFC. In planta protein interactions were investigated with bimolecular fluor- 
escence complementation in an Arabidopsis transient expression system as 
described previously”. 

SDS-PAGE and western blotting. Protein samples from total tissue extracts were 
separated by SDS-PAGE on 10% acrylamide midigels (Biorad) and then trans- 
ferred onto a polyvinylidene difluoride membrane (BioRad). Incubations with the 
antiserum and the secondary antibody conjugated to horseradish peroxidase 
(Agrisera) were performed following the method recommended for the ECL 
Plus western blotting detection system (GE Healthcare). 

Polyclonal anti-RAP2.12 antibodies were affinity purified at Genscript laborat- 

ories after being raised in rabbits against a RAP2.12/RAP2.2 specific synthetic 
peptide (NLKGSKKSSKNRSN). Lyophilized antibody was re-suspended to an 
approximate concentration of 1pgml~'. A monoclonal antibody against 
Arabidopsis ACTIN-11 (Agrisera, AS10 702) was used to confirm equal loading 
and transfer. Densitometric analysis of the protein signals on the western blots was 
performed with the software package UVP VisionWorks LS (Ultra-Violet 
Products). Normalization was carried out using the ACTIN-11 signal and setting 
to 100 the relative protein signal value for each of the ‘air’ controls. 
Statistical analyses. Significant variations between genotypes or treatments were 
evaluated statistically by Sigmaplot using either a t-test, one-way or two-way 
ANOVA where appropriate. Mean values that were significantly different 
(P <0.05) from the control or wild-type treatment are marked with an asterisk. 
The statistical evaluation of the microarray experiments is described earlier. 
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carriers of unidentified infrared emission features 
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Unidentified infrared emission bands at wavelengths of 3-20 
micrometres are widely observed in a range of environments in = 4 = 2.0 -——7.7 —7 77717 —1— ——— 
our Galaxy and in others’. Some features have been identified as | NGC 7027 INe vj [Ne ul] 
the stretching and bending modes of aromatic compounds”’, and 
are commonly attributed to polycyclic aromatic hydrocarbon 
molecules**. The central argument supporting this attribution is 
that single-photon excitation of the molecule can account for the 
unidentified infrared emission features observed in ‘cirrus’ clouds 
in the diffuse interstellar medium®. Of the more than 160 mol- 
ecules identified in the circumstellar and interstellar environ- 
ments, however, not one is a polycyclic aromatic hydrocarbon 
molecule. The detections of discrete and broad aliphatic spectral 
features suggest that the carrier of the unidentified infrared emis- 
sion features cannot be a pure aromatic compound. Here we report 
an analysis of archival spectroscopic observations and demonstrate 
that the data are most consistent with the carriers being amorph- 
ous organic solids with a mixed aromatic-aliphatic structure. This 2 4 6 8 40 12. 44 16 18 20 
structure is similar to that of the organic materials found in 4 
meteorites, as would be expected if the Solar System had inherited 
these organic materials from interstellar sources. f IRAS 22272+5435 
For the past 20 years, polycyclic aromatic hydrocarbon (PAH) mol- 
ecules have commonly been considered the carriers of unidentified 
infrared emission (UIE) features. This hypothesis assumes that the 
UIE features are the result of infrared fluorescence from small (~50- 
carbon-atom) gas-phase PAH molecules being pumped by far-ultraviolet 
photons’. In spite of its popularity, the PAH hypothesis does not provide 
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Figure 1 | Mixed aromatic and aliphatic features in the infrared spectra of 
circumstellar and interstellar nebulae. a—c, Spectral decompositions of the 
UIE features of the planetary nebula NGC 7027 (a), the proto-planetary nebula 
IRAS 22272+5435 (b) and the Orion bar photodissociation region (c), showing 
a mix of aromatic, aliphatic and continuum features. The observed flux at 

wavelength A, F;, is proportional to the emission intensity at that wavelength. A ae he a eens es ees ye ee ee 
series of discrete features (black lines) and plateau features (orange lines; in 4 6 8 10 12 14 #16 #18 #20 22 24 
c, the 17-,1m plateau represents the 15-20-m range) superposed on a 
continuum (blue line) have been fitted to the observed data. The UIE and 
plateau features (wavelengths in micrometres), as well as some of the atomic As [ Orion bar 
lines, are marked. The observed spectra are shown as solid red lines and the 
fitted spectra are shown as dotted black lines. The origin of the 20.1-j1m feature 
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in the IRAS 22272+5435 spectrum is currently unidentified. The spectral data i 

for NGC 7027, IRAS 22272+5435 and the Orion bar are retrieved from the = osb 

Infrared Space Observatory archive. For the spectral decomposition, we used = 

the IDL package PAHFIT originally developed to fit the Spitzer Space Telescope = oal 

Infrared Spectrograph spectra of nearby galaxies. The model spectra take into ie i 

account the contributions from the stellar continuum, the thermal dust = 93b 

continuum, H> emission, atomic emission lines, the UIE features (both uw 

aromatic and aliphatic) and the plateau emission features. The optimal fitting to ook ' 

the observed spectra is achieved through the Levenberg-Marquardt least- | 

squares algorithm. A modified blackbody model for the emission intensity at oi i | 4 
wavelength /, I, x 2 *B,(T), where B,(T) is the blackbody function with a L | \\ \\ 

temperature T, is used to fit the continuum. The aromatic, aliphatic and plateau 0.0L A SNS {umnpleta 4 
features are fitted with assumed Drude profiles I, x y?[(A/Ag — Ap/AY + y7] +, Thy eae eee pee pea = ae Sie 5 
where Ao is the central wavelength and y is the fractional full-width at half- 2 4 6 8 10 12 14 16 18 20 
maximum of each feature. Wavelength (11m) 
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a good explanation for the observed spectral behaviour. PAH molecules 
are fused ring molecules made up of carbon and hydrogen, and their 
vibrational bands are sharp and the peak wavelengths well defined. To fit 
the broad profiles of the UIE features seen in astronomical spectra, it is 
necessary to use a complex mixture of PAHs of different sizes, structures 
and charge states, and to utilize empirical feature profiles*’. Because PAH 
molecules require ultraviolet photons to excite them, they cannot explain 
the presence of UIE features in reflection nebulae” and proto-planetary 
nebulae’! where the central stars are cool and there is no ultraviolet 
background radiation. To account for these facts, the PAH model has 
to be revised to include large clusters and other ionization states. 

The central argument for the PAH hypothesis is that single-photon 
excitation of PAH molecules can account for the 12-m excess emis- 
sion observed in cirrus clouds in the diffuse interstellar medium by the 
Infrared Astronomical Satellite (IRAS). However, the UIE-band flux 
ratios in the diffuse H 1 regions in the Carina nebula are nearly con- 
stant over a range of three orders of magnitude in background radi- 
ation’. The shapes and peak wavelengths of the UIE features are 
independent of the temperature of the central stars providing the 
excitation’®. Furthermore, PAH molecules have strong and narrow 
absorption features in the ultraviolet, but these are not observed in 
interstellar extinction curves’*. However, the 3.4-11m aliphatic carbon- 
hydrogen stretching mode is commonly observed in absorption in the 
diffuse interstellar medium’. Although their rotational and vibra- 
tional frequencies are well known, not a single PAH molecule has 
yet been identified in space’. 

Other arguments have been made to support the PAH hypo- 
thesis: the asymmetric profiles of the UIE features can be explained 
by anharmonicity associated with molecular emission, and the 
observed feature-to-continuum ratio is high and therefore implies that 
the carrier is a molecule’. Laboratory spectra of mixed aromatic and 
aliphatic solid materials have asymmetric profiles’®, which can more 


Table 1 | Strengths of the UIE discrete and plateau features 


naturally explain the observations. Although the observed feature-to- 
continuum ratio is high in the diffuse interstellar medium, it is not high 
in the spectra of planetary and proto-planetary nebulae. Even in the 
diffuse interstellar medium, the strength of the UIE features are 
strongly correlated with the dust continuum, suggesting a possible 
physical relationship between the two components”. 

The basic premise of the PAH hypothesis does not concern the 
chemical composition of the carrier so much as its size. As long as 
the carrier is a nanoparticle that can undergo transient heating, it will 
satisfy the excitation requirement. Laboratory experiments have 
yielded carbon nanoparticles with structures of sp~ rings connected 
by networks of aliphatic chains'*”’, as well as fullerene fragments 
linked by aliphatic groups”’. These nanoparticles are likely to be con- 
stituents in circumstellar and interstellar environments. Furthermore, 
it has been proposed that the possible sudden release of chemical 
energy as a source of transient heating of small grains will allow much 
larger particles to radiate in the near-infrared, further weakening the 
PAH hypothesis”. 

An alternative explanation to the UIE bands is that they are emitted 
by complex organic solids with disorganized structures. These solids 
intrinsically have broad emission profiles, and the features often sit on 
even broader emission plateaux several micrometres in width. It has 
been argued for some time that the observed spectral properties of UIE 
bands resemble those of coal and kerogen”*”’. Coal and kerogen are 
amorphous organic solids with a mixed sp’-sp* composition with 
randomly oriented aromatic ring units linked by long, aliphatic chains. 
Their mixed sp*-sp* chemistry gives rise to the discrete aromatic and 
aliphatic emission features and the broad plateau features’®. 

To provide a quantitative comparison between the two models, we 
have performed spectral decomposition of several sources with strong 
UIE features. Figure 1 shows a fit to the infrared spectra of the planetary 
nebula NGC 7027, the proto-planetary nebula IRAS 22272+ 5435 and 


Discrete features* (%) 


Aromatic Aliphatic Unknown 
2 (um) 3.3 6.2 ia 8.6 11.3 3.4 6.9 158 16.4 18.9 
NGC 7027 0.32 L2 28 0.58 3.1 0.07 0.11 0.31 0.84 0.0 
IRAS 22272+5435+ 0.08 0.05 0.30 0.11 3.76 0.15 0.43 0.94 0.41 — 
Orion bar 0.67 3.8 7.0 2.2 2.6 0.13 0.37 0.24 2.0 0.0 
V2361 Cygni 0.27% 0.878 0.60|| 0.03 = = 0.25 = = = 
V2362 Cygni _ 52 24 0.8 = = 3.2 = = = 
Plateau features (%) 
Aromatic Aliphatic Unknown 
A (uum) 8 12 17 
NGC 7027 18.8 17.2 0.33 
IRAS 22272+5435 12.5 18.6 = 
Orion bar 9.3 15.1 0.78 
V2361 Cygni 11.1 13 = 
V2362 Cygni 17.1 1.2 Lt 
Continuum Total flux (3-20 ym) (Wm?) 
Percentage of total flux Temperature (K) x 
NGC 7027 50.7 100 2 8.9:(—11) 
RAS 22272+5435 40.8 100 2 25(=-11) 
Orion bar 47.3 70 2 2.4(-11) 
V2361 Cygni 84.8 350 0.2 1.8(-—13) 
V2362 Cygni 66.8 365 0.5 2.1 (—13) 


he UIE phenomenon is complex. In addition to the commonly observed 3.3-, 6.2-, 7.7-,8.6- and 11.3-1m aromatic features, there are also aliphatic features at 3.4 and 6.9 um, arising respectively from symmetric and 
asymmetric carbon-hydrogen stretching and bending modes of methyl and methylene groups attached to aromatic rings. Features at 15.8, 16.4, 17.4 (notshown), 17.8 (not shown), and 18.9 1m have been found in 
roto-planetary nebulae”, reflection nebulae*° and galaxies. In addition to the discrete features, broad emission features up to several micrometres in width are also seen. The 8- and 12-11m plateau features anda 
road feature covering the 15-20-11m range (represented by 17 ym in the table) have been detected in young stellar objects, compact H | regions and planetary nebulae. The 8- and 12-um plateaux are broad 
emission features (full-width at half-maximum, 2-4 xm) and can be identified as collective in-plane and, respectively, out-of-plane bending modes of a mixture of aliphatic side groups attached to aromatic rings?’. 
he 15-20-,1m plateau feature is also found to be strong in some proto-planetary nebulae. This table summarizes the relative contributions of these components of the spectra shown in Figs 1 and 3. 


*The total fluxes and percentages refer to the values emitted in the 3-20 um range for NGC 7027 and Orion bar, 5-20 wm for V2362 Cygni and V2361 Cygni, and 3-25 «um for IRAS 22272+5435. 

+ For IRAS 22272+5435, there are additional features that contribute to the flux in the 3-25-y1m range. Features at 12.2, 13.4 and 20.1 «tm contribute 3.83, 1.53 and 2.56%, respectively. Some of the contributions 
also come from the broad 26-um feature. 
£This entry refers to the spectral feature at 5.3 um. 
§ This entry refers to the spectral feature at 6.3 um. 
\| This entry refers to the spectral feature at 7.2 jum. 
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the Orion bar, a photodissociation region in the Orion nebula, using a 
set of discrete UIE features, broad plateau features, and the underlying 
dust continuum. The breakdown of contributions to the total fluxes 
from various components is summarized in Table 1. The strongest 
component is the continuum, which contributes approximately half 
of the total fluxes emitted in the 3-20-1m region. The next strongest 
are the plateau features, which account for 36, 31 and 25% of the total 
fluxes from NGC 7027, IRAS 22272 +5435 and the Orion bar, respect- 
ively, compared with the totals of 8, 4 and 16% from the aromatic 
features. The aliphatic branches probably constitute a significant frac- 
tion of the material in each of the three sources. 

The above fitting results show that the carrier of the UIE features 
includes a mixture of aromatic and aliphatic components, and is not a 
pure or predominantly aromatic compound. Because the carrier is 
formed from a mixture of cosmic gases, it is likely that the compound 
will include other abundant elements such as oxygen, nitrogen, sul- 
phur and so on, in addition to carbon and hydrogen. These impurities 
may also have spectral signatures that can be identified by observations 
at higher spectral resolution. A sketch of the proposed chemical struc- 
ture is shown in Fig. 2. 

The best way to study the origin of the UIE features is to observe 
them when they are formed. From observations of objects in the late 
stages of stellar evolution, we know that the UIE features develop in the 
circumstellar environment within a few hundred years after the ter- 
mination of the asymptotic giant branch™. Spectroscopic observations 
of novae have shown that the 3.3- and 3.4-1m features appear soon 
after dust condensation”’. Theoretically, it is difficult to understand 
how complex organics can form under such low-density conditions, 
but novae are observed to change from a pure gas spectrum to a dust- 
dominated spectrum over the course of days”®. In Fig. 3, we show a fit 
to Spitzer spectra of the novae V2362 Cygni and V2361 Cygni. It is 
expected that a mixture of miscellaneous aliphatic branches will attach 
to the newly formed ring clusters. The prominence of the plateau 
features reflects this early stage of organic dust condensation”. 

We note that the dominant organic content in carbonaceous chon- 
drites is a kerogen-like macromolecular solid referred to as insoluble 
organic matter. Recent laboratory analysis of the insoluble organic 
matter in the Murchison meteorite has suggested that it has a chemical 
structure very similar to that which we propose here**”’. The presence 
of insoluble organic matter in meteorites is evidence that complex 
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Figure 2 | Proposed structure of the carrier of UIE features. The structure is 
characterized by a highly disorganized arrangement of small units of aromatic 
rings linked by different kinds of aliphatic chain. Other impurities such as 
oxygen, nitrogen and sulphur are also commonly present. This structure 
contains about 100 carbon atoms and a typical nanoparticle may consist of 
multiple structures similar to this one. 
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Figure 3 | Emergence of complex organics after nova outburst. a, b, Fits to 


the Spitzer Infrared Spectrograph spectra of novae V2361 Cygni (a) and V2362 
Cygni (b) 251 and 446 days after their respective outbursts. In addition to the 
gas emission line spectrum, both spectra have developed strong dust continua, 
and the 8- and 12-1m plateau features are clearly present. The continua of 
V2361 Cygni and V2362 Cygni are fitted by modified blackbody intensities of 
the respective forms 2 °?B,(350 K) and 4” °°B,(365 K) (blue lines). The 
orange lines are the 8-, 12- and 17-,1m plateau features and the solid black lines 
are discrete features at 5.3, 6.3, 6.9, 7.2 and 8.6 um for V2361 Cygni and at 6.2, 
6.9, 7.6, 7.8 and 8.6 um for V2362 Cygni. The observed spectra are shown as 
solid red lines and the fitted spectra are shown as dotted black lines. The 
presence of the 8- and 12-1m plateau features suggests that the aliphatic 
component is the first to emerge after dust condensation. Because of the large 
number of emission lines, the atomic lines are not included in the fitting. 


organic solids form in nature with no difficulty. The fact that insoluble 
organic matter and circumstellar dust have similar chemical structures 
offers the possibility that Solar System organics may have a stellar 
connection. 
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Membrane protein sequestering by ionic protein- 


lipid interactions 
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Neuronal exocytosis is catalysed by the SNAP receptor protein 
syntaxin-1A', which is clustered in the plasma membrane at 
sites where synaptic vesicles undergo exocytosis”’. However, how 
syntaxin-1A is sequestered is unknown. Here we show that 
syntaxin clustering is mediated by electrostatic interactions with 
the strongly anionic lipid phosphatidylinositol-4,5-bisphosphate 
(PIP2). Using super-resolution stimulated-emission depletion 
microscopy on the plasma membranes of PC12 cells, we found that 
PIP2 is the dominant inner-leaflet lipid in microdomains about 73 
nanometres in size. This high accumulation of PIP2 was required for 
syntaxin-1A sequestering, as destruction of PIP2 by the phosphatase 
synaptojanin-1 reduced syntaxin-1A clustering. Furthermore, co- 
reconstitution of PIP2 and the carboxy-terminal part of syntaxin- 
1A in artificial giant unilamellar vesicles resulted in segregation of 
PIP2 and syntaxin-1A into distinct domains even when cholesterol 
was absent. Our results demonstrate that electrostatic protein-lipid 
interactions can result in the formation of microdomains indepen- 
dently of cholesterol or lipid phases. 

Phosphoinositides are lipids that contain an inositol head group 
conjugated to one to three phosphate groups. With ~1% of total lipids 
in the inner leaflet of the plasma membrane’, PIP2 is the most abundant 
phosphoinositide. Earlier studies identified PIP2 as a second messenger 
in the phospholipase-C signalling pathway. However, the list of cellular 
functions of PIP2 is rapidly growing, and PIP2 is also involved in 
membrane targeting, cytoskeletal attachment, endocytosis and exocy- 
tosis*. PIP2 interacts with many different proteins, through either 
unstructured basic residue-rich regions or more-structured domains*”. 

Neuronal exocytosis requires the presence of PIP2 at the plasma 
membrane***. The amount of PIP2 at the plasma membrane deter- 
mines the rate of vesicle priming, the size of the readily releasable pool 
and the rate of sustained exocytosis in stimulated cells*®**. This regu- 
lation is probably mediated by interactions of PIP2 with proteins 
involved in docking and fusion such as rabphilin, CAPS, synaptotagmin, 
SCAMP2 and Mint proteins®’. In docking, PIP2 clusters may act as 
molecular ‘beacons’ that target synaptic vesicles to the fusion sites. 
Indeed, PIP2 is locally enriched at the sites of docked vesicles and co- 
localizes with at least 5-10% of the microdomains of syntaxin-1A*°” 
(Supplementary Fig. 1), the membrane-anchored target SNAP receptor 
of neuronal exocytosis’. 

The amount of PIP2 at the sites of membrane fusion in PC12 cells 
has been estimated at 3-6% PIP2 coverage of local cell surface area? 
(Supplementary Fig. 2). In these experiments, membrane sheets were 
specifically stained for PIP2 with the Pleckstrin homology domain of 
protein lipase C delta fused to green fluorescent protein’ or citrine (a 
yellow fluorescent protein analogue’’; PHp;cs—citrine; Supplementary 
Figs 1 and 2), and the fluorescence of the punctuated PIP2 micro- 
domains was quantified. However, this approach underestimates the 
fraction of PIP2 if the PHpycs microdomains are smaller than the 
~200-nm diffraction-limited resolution of conventional fluorescence 


microscopy”. To obtain a more accurate estimate, we re-analysed PC12 
membrane sheets labelled with PHprcs-citrine or an antibody raised 
against PIP2 using super-resolution stimulated-emission depletion 
(STED) microscopy'' (Fig. lac). These experiments revealed that 
the clusters stained for PIP2 are much smaller than anticipated, with 
an average diameter of only 73 + 42 nm (s.d.). Although this is an 
overestimate because it represents the microdomain size convolved 
with the resolution of the STED microscope (~60 nm), it is in good 
agreement with the size of the syntaxin-1A microdomains”. 

Using a microdomain size of 73nm, we recalculated the surface 
density of PIP2 (Supplementary Methods). For this calculation, we 
first estimated the total amount of PIP2 in a microdomain when 
sampled at the diffraction-limited resolution of our epifluorescence 
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Figure 1 | PIP2 is the predominant inner-leaflet lipid in roughly 73-nm- 
sized microdomains. a, Confocal and corresponding nanoscale-resolution 
STED image of a PHp;cs—citrine-stained membrane sheet of PC12 cells. Note 
the increase in resolution. b, Same as a, but now immunostained with a 
monoclonal PIP2 antibody and a secondary antibody labelled with Alexa Fluor 
488. c, Size distribution of microdomains with PHp;cs-—citrine (blue; n = 433, 
24 sheets, two independent preparations) and PIP2 antibody (pink; n = 2,959, 
22 sheets, two independent preparations). The average diameter (full-width at 
half-maximum) was 73 + 42 nm (s.d.) for microdomains with PHpycs—citrine 
and 87 + 62 nm (s.d.) for microdomains with PIP2 antibody. d, Spatial 
distribution of PIP2. Black: the PIP2 distribution when sampled at too low 
diffraction-limited resolution (377 nm; Supplementary Fig. 2). Red: 
approximation of the PIP2 distribution in the ~73-nm microdomains. PIP2 
was accumulated over ~82% of the total surface area. See Supplementary 
Methods for details. 
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microscope (Fig. 1d, black mesh, and Supplementary Fig. 2). We then 
calculated the peak concentration when this amount of PIP2 was 
concentrated into 73-nm microdomains (Fig. 1d, red mesh). Here 
we assumed a Gaussian distribution of PIP2 in the microdomains. A 
peak surface coverage of 82% PIP2 was obtained (Fig. 1d). It must be 
kept in mind that at these high PIP2 concentrations, molecular crowd- 
ing might hinder binding of PHprcs-citrine; that relatively small 
errors in microdomain size and microscope resolution result in a 
substantial error in the PIP2 coverage; and that PHp,cs—citrine and 
antibody binding may alter PIP2 localization and is only indicative of 
PIP2 microdomains. Nevertheless, the PIP2 concentrations that we 
calculate are much higher than any previous estimate, and it seems 
safe to conclude that PIP2 is the dominant inner-leaflet lipid in the 
microdomains. The question then arises of by which molecular mech- 
anism such high concentrations of PIP2 are achieved. 

PIP2 hasa net negative charge of —3 to —5 (ref. 4) and interacts with 
polybasic stretches of amino acids**'*"*, Proteins with such stretches 
can sequester PIP2 in amounts in excess even of monovalent anionic 
lipids, such as MARCKS, spermine and even pentalysine*™*. Similar to 
these proteins, syntaxin-1A also possesses a stretch of basic amino 
acids. These residues are adjacent to the transmembrane domain 
and are in contact with the head groups of the phospholipids’>'® 
(Supplementary Fig. 3a). Indeed, it is well established that this con- 
served stretch with five positive residues ?°°KARRKK**) interacts 
with PIP2*!*7”. Removal of charge weakens this interaction (Sup- 
plementary Fig. 3b, c), but syntaxin-1A remains capable of fusing 
membranes even on removal of all five charges”"®. Because PIP2 co- 
localizes with at least a fraction of syntaxin-1A microdomains* 
(Supplementary Fig. 1), we speculated that their interaction might 
drive domain formation in a manner similar to that in various soluble 
lipid-binding proteins”’*. Two independent approaches were used to 
test this hypothesis: reconstitution in giant unilamellar vesicles'® 
(GUVs) and hydrolysis of PIP2 in PC12 cells using a membrane- 
targeted variant of the PIP2 phosphatase synaptojanin-1’. 

Syntaxin-1A clustered in a non-raft way in neutral cholesterol- 
phosphatidylcholine membranes’’. Here cholesterol clusters syntaxin- 
1A by competing for solvation by phosphatidylcholine. Indeed, a 
synthetic C-terminal peptide of syntaxin-1A (residues 257-288; 
3 mol%; Fig. 2 and Supplementary Fig. 4a) clustered in domains in >50% 
of GUVs composed of 1,2-dioleoyl-sn-glycero-3-phosphatidylcholine 
(DOPC) with 20 mol% cholesterol. This peptide contained both the 
polybasic juxtamembrane linker and transmembrane region and was 
amino-terminally labelled with either rhodamine red or Atto 647N. 
Analysis of fluorescence showed a (1.6 +0.2)-fold (s.d.; m= 18) 
enrichment of syntaxin- 1A 57_2sg in these clusters, but this is an under- 
estimate limited by the optics. Negatively charged PIP2 or 1,2-dioleoyl- 
sn-glycero-3-phosphatidylserine (DOPS) dispersed these clusters'’ 
(Fig. 2). Thus, although cholesterol competition might explain 
syntaxin-1A clusters that are not enriched in PIP2 (Supplementary 
Fig. 1), it cannot explain the high accumulation of PIP2 at the sites of 
docked vesicles. However, 1.5 mol% (total lipids) PIP2 also clustered 
syntaxin-1A in 1-10-j1m domains in 1-5% of the GUVs (Fig. 2 and 
Supplementary Fig. 4b, c). These domains did not depend on cholesterol 
or DOPS. In these domains, PIP2 was (1.9 + 0.2)-fold (s.d.; n = 13; Sup- 
plementary Fig. 5) enriched and syntaxin-1A257_2gs was (5.5 + 1.4)-fold 
(s.d.; n = 27) enriched, calculated on the basis of fluorescence. Notably, 
no domains were observed without peptide or when the PIP2 concen- 
tration exceeded 5 mol%. Divalent cations can act as bridges between 
two adjacent lipids and induce aggregation of PIP2 into clusters’”’, but 
even 1 mM Ca** was not sufficient to attract syntaxin-1A away from the 
microdomains. Domains were present both for synthetic dioleoyl-PIP2 
and for PIP2 extracted from pig brain (Supplementary Fig. 4b). Thus, 
syntaxin-1A can be clustered in the membrane by both cholesterol and 
PIP2. 

These cholesterol- and PIP2-mediated clusters both differ from lipid 
‘rafts’. They also differ from each other. First, PIP2 domains are always 


2 | NATURE | VOL 000 | 00 MONTH 2011 


3 

c < re 
3 o s 
2 5 2% & 7 © 
me} = ™ 
e fF ££ 9 & § oo 
(e) a eS) am a 2) = section 
a fe -- 
== fe .. 
, . ; | O — 
10 20 = = gQ yoni 
10 um 


Figure 2 | Confocal microscopy of syntaxin-1A domains in artificial 
membranes. Syntaxin-1A57_2g3 (Sx TMH; red) labelled with Atto 647N was 
reconstituted in GUVs. The membranes were composed of DOPC with 

1.5 mol% of the fluorescent lipid analogue 3,3’-dioctadecyloxacarbocyanine 
(DiO; green) and the percentages DOPS, cholesterol and PIP2 indicated in the 
figure. In the absence of anionic phospholipids, 20% cholesterol clustered 
syntaxin-1A in many small clusters (condition 2), as predicted in ref. 17. 
Inclusion of >5% anionic DOPS dispersed these clusters (condition 3). PIP2 
(1.5%) partitioned SxTMH in 1-10-m-sized domains regardless of cholesterol 
or DOPS (conditions 4-8). These clusters were no longer observed with 5% 
PIP2 (condition 9). The pink arrows show the part of the membrane used for 
cross-sections. Yellow bars indicate the positions of the domains. More data are 
presented in Supplementary Figs 4-9. 


round and only one or two form per vesicle, whereas cholesterol 
generally (but not always) induces many small domains (Supplemen- 
tary Fig. 4). Second, fluorescence recovery after photobleaching 
showed that syntaxin-1A remained mobile in the PIP2 domains and 
that syntaxin-1A was essentially immobile in the cholesterol-dependent 
clusters (Supplementary Fig. 6). Syntaxin-1A thus diffuses in the PIP2 
domains and forms large circular domains to minimize boundary 
energy”. Third, 6-dodecanoyl-2-dimethylaminonaphthalene” (laurdan) 
showed that the PIP2 domains were highly hydrated, whereas the 
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Figure 3 | Removal of PIP2 reduces syntaxin-1A clustering in PC12 cells. 
a, Membrane sheets of PC12 cells stained with 1-(4- 
trimethylammoniumphenyl)-6-phenyl-1,3,5-hexatriene'*** (TMA-DPH). 
Immunostaining with a monoclonal antibody raised against syntaxin-1A anda 
secondary antibody labelled with DyLight649 showed that endogenous 
syntaxin-1A clustered in microdomains (region b; pink)**”'?7*. 
Overexpressing the red-fluorescent-protein-tagged and membrane-targeted 
catalytic region of synaptojanin-1’ (residues 498-901; cell outlined in blue) 
reduced this syntaxin-1A clustering 3.7-fold (region c; orange; Supplementary 
Fig. 13). Synaptojanin-1 is the 5-phosphatase of PIP2 and overexpression of the 
construct completely removes PIP2 from the membrane (Supplementary Fig. 
12). b, c, Magnifications of regions b (b) and c (c). Cross-sections along the 
indicated cuts indicate the clustering. a.u., arbitrary units. 


cholesterol domains were much more densely packed (Supplementary 
Fig. 7). Fourth, phase contrast microscopy showed a thickening of the 
cholesterol-dependent clusters but not of the PIP2-domains (Sup- 
plementary Fig. 8). Thus, even though no saturated lipids are present, 
the cholesterol-dependent domains have behaviour that resembles the 
liquid ordered phase. In contrast, the PIP2 domains seem much more 
disordered and resemble the liquid disordered phase. Ca** demixing 
of polyanionic amphiphiles showed that electrostatic interactions can 
indeed lead to liquid-like domains”. 

The transmembrane helix of syntaxin-1A has been reported to 
homodimerize. However, introducing the Met267Ala, Cys271Ala and 
Ile279Ala mutations that prevent homodimerization of the syntaxin- 
1A peptides” did not prevent cholesterol- or PIP2-mediated clustering 
(Supplementary Fig. 9). By contrast, no PIP2 domains were observed 
when two charges (Lys264Ala and Lys265Ala) from the polybasic linker 


" DOPC DOPS 


Figure 4 | Simulations of the dynamic and amorphous PIP2/syntaxin-1A 
microdomains. a, b, Side view (a) and top view (b) of a coarse-grained 
molecular dynamics simulation. Sixty-four copies of syntaxin-1A 57 2g 
(SxTMH) and 64 copies of PIP2 were incorporated in a bilayer composed of 
DOPC and DOPS in a 4:1 molar ratio. PIP2 was present only in the membrane 
leaflet facing the N-terminus of syntaxin-1A 57 2g. Simulations were 
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were removed, but cholesterol-dependent clusters were still observed 
(Supplementary Fig. 5). Overexpression of the C-terminal part of 
syntaxin-1A fused to green fluorescent protein™ in PC12 cells also 
showed 4-8-fold loss of clustering of the Lys264Ala Lys265Ala mutant 
(Supplementary Figs 10 and 11). These data show that electrostatic 
interactions between PIP2 and the juxtamembrane helix of syntaxin- 
1A are sufficient for domain formation. 

We then set out to investigate to what extent PIP2 is required for 
syntaxin-1A clustering in PC12 cells. For this purpose, we expressed a 
red-fluorescent-protein-tagged construct containing the phosphatase 
domain of synaptojanin- 1 fused to a CAAX box, resulting in its targeting 
to the plasma membrane’. Synaptojanin-1 is a polyphosphoinositide 
5-phosphatase, and the expression of the construct completely removed 
PIP2 from the plasma membrane®’ (Supplementary Fig. 12). Notably, 
synaptojanin-1 expression reduced 3.7-fold the punctuate distribution 
of endogenous syntaxin-1A (Fig. 3 and Supplementary Fig. 13). Thus, 
this provides evidence that PIP2 is indeed required for at least part of 
syntaxin-1A microdomain formation. 

We performed molecular dynamics simulations to gain insight into 
the precise conformation of the PIP2/syntaxin-1A microdomains. In 
these coarse-grained simulations, several atoms were represented by 
one simulation bead’**’* (Supplementary Fig. 14). This allowed for 
simulations of relatively large lipid bilayers comprising ~2,500 copies 
of DOPC and DOPS in a 4:1 molar ratio and 40-64 copies of syntaxin- 
1A957-28g and PIP2. In a simulation time of 10 tts, up to ten copies of 
syntaxin-1A,57 533 clustered with PIP2 into microdomains (Sup- 
plementary Fig. 15). Equal amounts of PIP2 and syntaxin-1A were 
present in the bulk phase of those domains, whereas more PIP2 and 
DOPS associated transiently to the periphery. We used this informa- 
tion to construct a domain with 64 copies of syntaxin-1A (Fig. 4 and 
Supplementary Movie 1), which is comparable to the syntaxin-1A 
content in the microdomains in PC12 cells’. These domains were 
stable over a simulation time of 6 us and contained <10% residual 
DOPC or DOPS. We conclude that syntaxin-1A and PIP2 can form 
dynamic, amorphous networks with PIP2 acting as a ‘charge bridge’ 
spanning the distance between the various syntaxin-1A molecules 
(Fig. 4c). 

Our findings show that electrostatic interactions between the mem- 
brane lipid PIP2 and the SNAP receptor syntaxin-1A suffice to induce 
membrane sequestering and microdomain formation without the 
need for high local PIP2 production or a (complex) ‘molecular fence’ 
restricting PIP2 and protein diffusion’’. This does not exclude 
an additional role for protein-protein interactions between either 


performed with 150 mM NaCl. See Supplementary Methods for details. White, 
lipid alkyl chain; cyan, DOPS head group; grey, DOPC head group; yellow, 
syntaxin-1A 57 2g transmembrane region (residues 266-288); blue-red, 
polybasic linker region (residues 257-265; charges in red); orange-blue, anionic 
PIP2 head group (charges in blue). The domains were stable over a simulation 
time of 6 1s (Supplementary Movie 1). ¢, Simplified scheme of the cluster. 
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transmembrane helices or soluble domains. In fact, these seem essen- 
tial for segregation of proteins with similar structures and sizes, such as 
syntaxin-1A and syntaxin-4'*** (both have polybasic regions and cluster 
separately). The mutual enrichment of syntaxin-1A and PIP2 at the 
fusion sites by electrostatic interactions has clear advantages. First, accu- 
mulation of syntaxin-1A may facilitate SNAP receptor interactions and 
thereby increase the membrane fusion efficiency*”*. Second, the lipid 
environment modulates the energetic requirements for fusion’*"®. 
Third, both PIP2 and syntaxin-1A function as molecular docking sites 
and facilitate assembly of the complete fusion machinery'*®°. Our 
findings that electrostatic protein-lipid interactions are sufficient for 
membrane sequestering indicate that such interactions constitute a 
mechanism for the formation of protein microdomains in the mem- 
brane that is clearly distinct from protein partitioning by means of the 
well-established lipid phases’*”’. 


METHODS SUMMARY 


PHp;cs-citrine was expressed in Escherichia coli and purified using His-tag affin- 
ity purification. PC12 cells were maintained and propagated as described in refs 3, 
24. PC12 cells were transfected using Lipofectamine LTX (Invitrogen). Membrane 
sheets were prepared by rupturing the cells with probe sonication as described in 
ref, 24, and immunostaining” and microscopy'’** were performed as described in 
the corresponding references. The peptides were synthesized by microwave- 
assisted Fmoc solid-phase synthesis. Peptides were mixed with lipids in organic 
solvent and GUVs were formed by the drying rehydration procedure. The molecular 
dynamics simulations were performed with the GROMACS simulation package and 
the MARTINI coarse-grained model’*”’. See Supplementary Methods for details. 
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Image-based genome-wide siRNA screen identifies 
selective autophagy factors 
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Selective autophagy involves the recognition and targeting of spe- 
cific cargo, such as damaged organelles, misfolded proteins, or 
invading pathogens for lysosomal destruction’*. Yeast genetic 
screens have identified proteins required for different forms of 
selective autophagy, including cytoplasm-to-vacuole targeting, 
pexophagy and mitophagy, and mammalian genetic screens have 
identified proteins required for autophagy regulation’. However, 
there have been no systematic approaches to identify molecular 
determinants of selective autophagy in mammalian cells. Here, to 
identify mammalian genes required for selective autophagy, we per- 
formed a high-content, image-based, genome-wide small interfer- 
ing RNA screen to detect genes required for the colocalization of 
Sindbis virus capsid protein with autophagolysosomes. We iden- 
tified 141 candidate genes required for viral autophagy, which were 
enriched for cellular pathways related to messenger RNA proces- 
sing, interferon signalling, vesicle trafficking, cytoskeletal motor 
function and metabolism. Ninety-six of these genes were also 
required for Parkin-mediated mitophagy, indicating that common 
molecular determinants may be involved in autophagic targeting of 
viral nucleocapsids and autophagic targeting of damaged mitochon- 
dria. Murine embryonic fibroblasts lacking one of these gene pro- 
ducts, the C2-domain containing protein, SMURF1, are deficient 
in the autophagosomal targeting of Sindbis and herpes simplex 
viruses and in the clearance of damaged mitochondria. Moreover, 
SMURF1-deficient mice accumulate damaged mitochondria in the 
heart, brain and liver. Thus, our study identifies candidate determi- 
nants of selective autophagy, and defines SMURF 1 as a newly recog- 
nized mediator of both viral autophagy and mitophagy. 

To identify novel genes required for selective autophagy, we per- 
formed a genome-wide siRNA screen to detect changes in the coloca- 
lization of a red-labelled Sindbis virus (SIN) capsid protein with a 
green fluorescent protein (GFP)-labelled marker of autophagosomes, 
GFP-LC3 (LC3 is also known as MAP1LC3) (Supplementary Fig. 1a) 
in SIN-infected HeLa/GFP-LC3 cells (ref. 6). Using correlative light 
and electron microscopy, we confirmed that colocalized red and green 
puncta represented autophagic structures (primarily autolysosomes) 
containing numerous viral nucleocapsids (Fig. 1a). The predominance 
of viral nucleocapsids concentrated in these structures (relative to 
within the cytoplasm) is consistent with selective autophagic targeting 
of viral nucleocapsids (herein referred to as virophagy). 

Screening of a human siGenome library containing 21,215 siRNA 
pools showed that knockdown of 195 and 13 genes resulted in decreased 


or increased colocalization, respectively, (Fig. 1b, Supplementary 
Table 1 and Supplementary Fig. 1b). Genes were re-screened with sets 
of four individual siRNAs (Supplementary Table 2; see column ‘J’ of 
Supplementary Table 3 for siRNA sequences) to confirm our primary 
screen and rule out potential off-target effects of individual siRNAs; 
knockdown with two or more siRNAs resulted in decreased colocaliza- 
tion for 141 (72%) genes. (Fig. 2 and Supplementary Figs 1b and 2a). 
None of these 141 gene knockdowns decreased numbers of green 
puncta in uninfected cells (data not shown), indicating that these genes 
function in virophagy, but not in regulation of autophagy. There was no 
enrichment of siRNAs-containing microRNA seed sequences among 
these genes (P = 0.95) (Supplementary Tables 3 and 4), indicating that 
bias due to miRNA-like off-target effects was unlikely. There was a low 
confirmation rate for siRNAs that increased colocalization (2 of 13 
genes); therefore, we subsequently focused only on siRNAs that 
decreased colocalization. 

Bioinformatic analyses of the 141 confirmed hits required for SIN 
capsid/GFP-LC3 colocalization showed enrichment for gene sets 
associated with biological processes and molecular functions including 
RNA splicing/processing, protein phosphorylation, transport, calcium- 
binding and the cytoskeleton (Supplementary Table 5 and Supplemen- 
tary Fig. 3a). Examination of our hits within a framework of functional 
cellular pathways revealed strongly enriched network modules asso- 
ciated with RNA processing, interferon (IFN)-« and -y signalling, 
SNARE vesicular transport, cytoskeletal-associated components, and 
several metabolic pathways (Supplementary Fig. 3b and Fig. 1c). This is 
consistent with the function of IFN-7 in selective microbial autophagy’ 
and the described role of the actin cytoskeleton in selective autophagy 
in yeast’ and mammalian® cells. The enrichment of SNARE proteins 
suggests that in addition to a function in autophagosome formation 
and maturation””°, these proteins may be involved in the trafficking of 
selective cargo to the autophagosome. Twelve colocalization hits form 
primary interactions with core autophagy machinery and associated 
components"’ (Fig. 1d). One colocalization screen hit, clathrin inter- 
action 1 (CLINT) interacts with ATG8 components (GABARAPLI, 
MAP1LC3A, MAP1LC3B), which are crucial in the recognition of 
cargo during selective autophagy’. Another hit, ATG13, is a member 
of the core autophagy network, indicating that it may have an as yet 
undefined function in selective autophagy, in addition to its role in 
the ULK1 (also known as Atg1) autophagy induction complex”. Five 
colocalization hits, SMURF1, NEFM, KCNAB2, SFRS4 and UBA52, 
interact with p62 (also known as SQSTM1), a known adaptor in diverse 
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Figure 1 | Genome-wide screen to identify cellular factors required for 
Sindbis virus capsid colocalization with autophagosomes. a, Correlative 
light and electron microscopy of HeLa/GFP-LC3 cell infected with SIN/ 
mCherry.capsid virus. Top left, deconvolved image of red and green 
fluorescence channels; middle, three-dimensional surface reconstruction of red 
and green channels; right, yellow (red + green) colocalization channel. Arrows 
denote yellow puncta that correspond to ‘1’ and ‘2’ in electron microscopy 
images below. Bottom left, electron microscopy of identical cell; middle and 
right, high magnification images of insets ‘1’ and ‘2’. Scale bars, left, 10 um; 
middle and right, 200 nm. b, Ranked distribution of median Z-scores for each 
siRNA pool in primary colocalization (virophagy) screen. Red, decreased 
colocalization; green, increased colocalization; blue, insufficient numbers of 
green or red puncta per cell or total number of cells per well for analysis. c, Maps 
of protein interactions in enriched network modules (see Supplementary Fig. 
3b). d, Association of siRNA hits with autophagy network. 


forms of selective autophagy’, including SIN capsid targeting to auto- 
phagosomes’®. 

Selective SIN autophagy (virophagy) promotes the survival of SIN- 
infected cells’. To determine if our identified candidate virophagy 
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genes have a similar function, we screened our confirmation siRNA 
library for genes that decreased cell survival after SIN infection. Two or 
more siRNAs targeting 98 of the genes decreased cell survival after SIN 
infection (Fig. 2, Supplementary Tables 3 and 6 and Supplementary 
Figs 1b and 2a); colocalization and cell survival effects of individual 
siRNAs were significantly correlated (Supplementary Fig. 2b) 
(P=3.8X10 8, Spearman correlation). This is consistent with a 
pro-survival function of autophagic targeting of SIN capsid in virally 
infected cells®. 

To investigate whether the identified candidate virophagy genes 
also function in other forms of selective autophagy, we performed a 
secondary screen for autophagy of damaged mitochondria (mito- 
phagy). We used HeLa cells that express an mCherry fusion of 
Parkin, a cytosolic E3 ubiquitin ligase that translocates to depolarized 
mitochondria to induce mitophagy after treatment with uncoupling 
agents (such as CCCP, carbonyl cyanide m-chlorphenylhydrazone)”’. 
Of the 141 confirmed colocalization hits, 2 or more siRNAs targeting 
96 (68%) genes decreased mitophagy (Fig. 2, Supplementary Tables 3 
and 7 and Supplementary Figs 1b and 2a, b). Host factors involved in 
viral autophagy and mitophagy overlapped significantly (P = 0.019, 
Spearman correlation). The minority of genes that only scored positive 
in either the virophagy confirmation or the mitophagy secondary 
screen may have a role in targeting some, but not other cargoes, for 
selective autophagy; however, the lack of overlap may also reflect 
different sensitivities of the two screens. Mitophagy hits consisted of 
several mitochondria-associated components’* (NME2, MDH1, 
NTHL1, PDK1, COX8A, MRPS2, MRPS10, NDUFB9 and BLOCI1S) 
and interactors of mitochondria-associated components (Supplemen- 
tary Fig. 4). 

We focused further on one gene, SMURF1 (SMAD specific E3 
ubiquitin protein ligase 1), encoding a HECT-domain ubiquitin ligase 
that targets several cytoplasmic proteins for degradation’>. SMURF1 
was a confirmed hit in all three confirmation or secondary screening 
assays (see Supplementary Fig. 5 for representative raw data from 
colocalization confirmation screen), is present in two of the enriched 
networks (mRNA processing and actin cytoskeleton) (Fig. 1c), and is a 
predicted interacting partner of the autophagy adaptor, p62 (refs 2, 4, 
Fig. 1d). 

We confirmed that SMURF1 is not required for general autophagy, 
but is a bona fide mediator of selective autophagy, including virophagy 
and mitophagy. siRNA knockdown of SMURF1 in HeLa cells, unlike 
knockdown of the essential autophagy protein, ATG7, did not alter 
general starvation-induced autophagy (Supplementary Fig. 6a). 
Furthermore, Smurfl~'~ murine embryonic fibroblasts (MEFs) had 
normal levels of starvation-induced LC3-II (lipidated form of 
MAPI1LC3) conversion, p62 degradation, and ultrastructural evidence 
of autophagosome and autolysosome accumulation (Fig. 3a, b). 

However, a significant decrease in SIN/mCherry.capsid/GFP-LC3 
colocalization was observed in SIN-infected Smurfl~'~ MEFs (Fig. 
3c, d). Similar to p62 (ref. 6), SMURF1 and SIN capsid protein co- 
immunoprecipitate in SIN-infected MEFs and HeLa cells (Sup- 
plementary Fig. 7a, b). SMURF1 is not required for the interaction 
between p62 and SIN capsid (Supplementary Fig. 7c). The interaction 
between SMURF and SIN capsid may be relevant for targeting SIN 
capsid for autophagosomal degradation, as levels of SIN capsid were 
increased in Smurfl”'~ MEFs and SMURF siRNA-treated HeLa cells. 
(Supplementary Fig. 7d-f). Increased SIN capsid levels in Smurfl- 
deficient cells cannot be explained by increased capsid production 
because viral growth was similar in Smurfl~‘~ and wild-type MEFs 
(Supplementary Fig. 7g, h), or by changes in proteasomal degradation 
because SIN capsid levels were not altered by treatment with the 
proteasome inhibitor MG132 (Supplementary Fig. 7d), and SIN 
capsid ubiquitination was not detected (data not shown). SIN-infected 
Smurfl-‘~ MEFs had accelerated cell death (despite similar viral 
titres) as compared to wild-type controls (Fig. 3g). Thus, SMURF1 
interacts with SIN capsid, SMURF1 is required for SIN capsid 
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Figure 2 | Gene list for viral capsid/autophagosome colocalization (C) 
confirmation screen and secondary screens for survival of virus-infected 
cells (S) and Parkin-mediated mitophagy (M). Shown are the numbers of 
individual siRNAs from a pool of four targeting each gene that scored positive 
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Figure 3 | SMURF1 functions in virophagy but not in starvation-induced 
autophagy. a, Electron microscopy analysis of wild-type (WT) and Smurfl‘~ 
(KO) MEFs grown in normal media or EBSS (starvation) for 4h. Arrowheads, 
representative autophagosomes; arrows, representative autolysosomes. Scale 
bars, 500 nm. b, Western blot analyses of LC3-I/II (non-lipidated and lipidated 
forms of MAP1LC3, respectively) and p62 levels in MEFs of indicated 
genotype. c, Quantification of colocalization of SIN/mCherry.capsid and GFP- 
LC3 in indicated MEFs 16 h after SIN/mCherry.capsid/GFP-LC3 infection. 
Data shown represent mean + s.e.m. number of colocalized red and green 
puncta per cell for 50 cells per well in triplicate samples. *P < 0.001 against 
wild-type, Student's ¢-test. d, Representative confocal micrographs of images 
used for quantification in c. Arrows, colocalized red and green puncta. Scale 
bar, 15 jim. e, Representative electron microscopy images of indicated MEFs 
infected with HSV-1 (strain 17termA). White arrowheads, partially degraded 
viral nucleocapsids inside autolysosomes; black arrowheads, intact viral 
nucleocapsids inside viral vesicles. Scale bar, 0.5 um. For a-e, similar results 
were obtained in 3-5 independent experiments. 


in each screen. Red, genes with 2 or more positive siRNAs (confirmed hits); 
Green, genes with <2 positive siRNAs. See Supplementary Tables 2-8 for 
further details. 


targeting to autophagosomes and degradation through a proteasome- 
independent pathway, and SMURF 1-dependent degradation of SIN 
capsid promotes cell survival. 

To determine whether SMURF1 is required for the autophagic 
targeting of other viruses, we performed electron microscopy of wild- 
type and Smurfl-’" MEFs infected with a mutant strain of herpes 
simplex virus type 1 harbouring a deletion of ICP34.5, a potent inhibitor 
of viral autophagy'** (Fig. 3e). As reported’’, the majority of cyto- 
plasmic HSV-1 virions in wild-type MEFs were inside autolysosomal 
structures and appeared partially degraded. In contrast, in Smurf1~'~ 
MEFs, the majority of cytoplasmic HSV-1 virions were inside single- 
membraned vesicles involved in HSV-1 cytoplasmic egress and had an 
intact structure. This lack of autophagic targeting of HSV-1 in 
Smurf1~'~ MEFs was not due toa general defect in autophagy, because 
HSV-1 infection induced autophagy similarly in Smurfl~~ and wild- 
type MEFs (Supplementary Fig. 6b). Thus, SMURF is required for the 
autophagic targeting of both a positive-strand RNA (Sindbis) and a 
double-stranded DNA (herpes simplex) virus. 

Next, we examined the role of SMURF 1 in mitophagy. In HeLa cells, 
all four SMURF1 siRNAs decreased SMURF protein expression (Sup- 
plementary Fig. 8a) and inhibited Parkin-mediated CCCP-induced 
mitophagy as effectively as an siRNA targeting p62, a mediator of 
mitophagy in some previous reports'’”°, and siRNA targeted against 
the essential autophagy gene, ATG7 (Supplementary Fig. 8b, c). The 
magnitude of each individual siRNA’s effect on SMURF1 protein 
expression knockdown correlated with the magnitude of inhibition 
of Parkin-mediated autophagy. Therefore, in the mitophagy confirma- 
tion screen, two of the SMURF1 siRNAs were probably false negatives; 
indeed, the number of Parkin-expressing cells in wells treated with 
these siRNAs was low (data not shown), precluding meaningful 
statistical analyses. A similar finding was true in the viral colocalization 
screen. 

We further examined the role of SMURF1 in mitophagy by assessing 
mitochondrial clearance in CCCP-treated Smurfl’” MEFs. Unlike in 
HeLa cells, Parkin overexpression did not promote mitophagy in MEFs 
of either genotype (data not shown). However, 25-30% of MEFs 
treated with 10 14M CCCP showed changes in mitochondrial morpho- 
logy. In wild-type MEFs with damaged mitochondria (swollen or frag- 
mented appearance), partial mitochondrial clearance occurred with 
compaction of the remaining mitochondria around the nucleus 
(Fig. 4a). In contrast, in Smurfl~'~ cells with damaged mitochondria, 
virtually no mitochondrial clearance occurred and there was diffuse 
accumulation of fragmented mitochondria throughout the cytoplasm 
(Fig. 4a, arrows). This phenotypic difference was confirmed using 
two independent methods of quantification, including assessment 
of the total percentage of CCCP-treated cells that displayed diffuse 
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accumulation of abnormal mitochondria (Fig. 4b) and the measure- 
ment of fractional mitochondrial surface area per cell (Fig. 4c). 

To evaluate the mechanism of action of SMURF1, we compared the 
effects of wild-type and mutant SMURF expression plasmids on rescue 
of selective autophagy in Smurfl~’~ MEFs (Fig. 4a—c). We focused on 
mitophagy rather than SIN capsid virophagy because of the resistance of 
MEFs to SIN infection after plasmid transfection. The defect in mito- 
phagy in Smurfl~/~ MEFs was partially rescued by wild-type SMURF1 
transfection. SMURF1AHECT™, lacking the HECT domain that cata- 
lyses ubiquitin ligation onto target proteins, or SMURF1(C699A)"', a 
catalytically inactive point mutant, rescued the mitochondrial clear- 
ance defect as efficiently as wild-type SMURF 1. Thus, in addition to its 
known role in targeting proteins for proteasomal degradation via ubi- 
quitination, SMURF has a ubiquitin ligase activity-independent func- 
tion in mediating the selective degradation of damaged mitochondria. 

In contrast, a SMURF1 mutant lacking the C2 domain, SMURF1AC2, 
was completely defective in mitophagy rescue in CCCP-treated 
Smurfl-'~ MEFs (Fig. 4a-c), despite similar levels of expression as 
transfected wild-type SMURF1 (Supplementary Fig. 9a). The C2 
domain of SMURF1 was not required for SMURF1 co-immunopreci- 
pitation with p62 (Supplementary Fig. 9b), indicating that SMURF1 
does not function in selective autophagy by recruiting p62. C2 domains 
(including those of protein kinase C and SMURF1) bind membrane 
phospholipids and function in protein targeting to the plasma mem- 
brane and/or membrane subcellular compartments”. This raised the 
possibility that SMURF1 might function in the targeting of selective 
autophagy cargo through interaction with the nascent autophagosome 
membrane. 
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Figure 4| SMURF functions in mitophagy. 

a, Representative mitochondrial morphology in 
Smurf1*!* (wild-type) and Smurfl'~ (KO) MEFs 
transfected with indicated construct and treated 
with DMSO or 10 uM CCCP for 24h. 

b, Quantification of percentage of total cells with a 
diffuse accumulation of abnormal fragmented 
mitochondria and lack of mitochondrial clearance. 
Results shown represent combined data from 3-5 
experiments per condition with triplicate wells (of 
at least 100 cells per well) analysed for each 
condition per experiment. Shown are 

mean + s.e.m. for average values from each 
experiment. Similar results were observed in each 
independent experiment. *P < 0.001, Student’s 
t-test. c, Measurement of mitochondrial fractional 
area (percentage of total cellular area) in MEFs 
treated as in a. Results shown represent 

mean + s.e.m. for 50 cells per condition. 

d, Representative confocal micrographs of KO 
MEFs transfected with YFP-SMURF1 wild-type or 
YFP-SMURF1AC2 (AC2) and treated for 4h with 
DMSO or CCCP. e, Representative confocal 
micrographs of KO MEFs transfected with GFP- 
LC3 and wild-type mCherry-SMURF1 (WT) or 
mCherry-SMURF1AC2 (AC2) and treated for 4h 
with CCCP. Inset, upper right, formation of 
completed autophagosome around a damaged 
mitochondrion associated with wild-type 
SMURF]; insets, lower right, incomplete 
autophagosomes or absence of LC3 signal around 
mitochondria associated with SMURF1AC2. See 
also Supplementary Figs 10 and 11 for enlarged 
images. f, Representative electron microscopy 
images of indicated tissues from 10-month-old 
Smurf1'!* (WT) or Smurf1 ~'~ (KO) mice. 
Arrows, representative abnormal mitochondria. 
Scale bars, 1 um. Similar abnormalities were 
observed throughout entire electron microscopy 
tissue section for each mouse and in tissue samples 
from three different mice for each genotype. 


To investigate this possibility, we examined the subcellular local- 
ization of wild-type SMURF1 and SMURF1AC2 with damaged mito- 
chondria and autophagosomes (Fig. 4d). In Smurfl”' MEFs 
transfected with wild-type, yellow fluorescent protein-conjugated 
YFP-SMURF1, CCCP treatment induced the colocalization of YFP- 
SMURF1 with damaged mitochondria. In Smurfl~/~ MEFs trans- 
fected with YFP-SMURF1AC2, increased numbers of fragmented 
and swollen mitochondria were observed in basal conditions and these 
increased further upon CCCP treatment. These abnormal mitochon- 
dria colocalized with YFP-SMURF1AC2, whereas normal reticular- 
appearing mitochondria rarely colocalized with YFP-SMURF1AC2. 
YFP-SMURF1(C699A) displayed the same subcellular staining 
pattern as wild-type YFP-SMURF1 (data not shown). Thus, 
SMURF colocalizes with damaged mitochondria in a C2 domain- 
independent manner. 

We next determined whether the C2 domain of SMURF1 was 
required for the colocalization of damaged mitochondria with autopha- 
gosomes (Fig. 4e and Supplementary Figs 10 and 11). In cells expressing 
wild-type mCherry-SMURF1, mitochondria were mostly compacted 
around the nucleus, and numerous autophagosomes were observed sur- 
rounding structures that labelled positive for both mCherry-SMURF1 
and the mitochondrial marker, Tom20 (also known as TOMM20). 
In contrast, in cells expressing mCherry-SMURF1AC2, mCherry- 
SMURFIAC2- and Tom20-postive mitochondria were rarely found 
inside autophagosomes. In many regions, GFP-LC3-positive linear or 
cup-shaped structures were observed near mCherry-SMURF1AC2- 
positive mitochondria, but complete autophagosomes surrounding these 
mitochondria could not be detected. Thus, the C2 domain of SMURF1 is 
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not required for its targeting to damaged mitochondria, but is required 
for damaged mitochondria to be normally engulfed by autophagosomes. 
It is not yet known whether this requirement reflects a direct role for the 
C2 domain in binding to autophagosomal membrane phospholipids or 
is a more indirect consequence of other, as-yet-undescribed, effects of the 
C2 domain in mitophagy. 

To investigate whether SMURF1 may function in selective autop- 
hagy in vivo, we performed electron microscopy analyses of cerebel- 
lum, liver and hearts of 10-month-old wild-type and Smurf1~/~ 
mice”. In all three organs, Smurf1~'~ mice showed an accumulation 
of abnormal mitochondria that were swollen, fragmented, and/or con- 
tained abnormal cristae (Fig. 4f). This phenotype is consistent with a 
defect in mitophagy and mitochondrial quality control; however, we 
cannot rule out unknown triggers of mitochondrial damage in these 
animals. In the livers of Smurfl~'~ mice, mitochondria were spatially 
disorganized and surrounded by networks of dilated endoplasmic 
reticulum, perhaps reflecting a defect in mitochondrial targeting by 
isolation membranes (which are believed to originate from the endo- 
plasmic reticulum’) and/or a defect in selective autophagy of the endo- 
plasmic reticulum. There was a marked accumulation of lipid droplets 
in the livers of Smurf1~'~ mice (Supplementary Fig. 12a), which may 
be consistent with selective degradation of lipid droplets by autophagy 
(lipophagy) in hepatocytes**. Furthermore, the granule cell layer of 
the cerebellum and cardiomyocytes of Smurfl~/~ mice had increased 
numbers of p62 aggregates (Supplementary Fig. 12b). Unlike findings 
in brains and hearts of mice lacking core autophagy genes”, p62 
aggregate accumulation in these tissues was not associated with ubi- 
quitin accumulation. This is consistent with a role for SMURFI1 in 
selective autophagy but not in the form of basal autophagy that is 
involved in protein quality control”. 

Together, our data in Smurfl~'~ MEFs and in Smurfl~‘~ mice 
suggest a crucial function for SMURF1 in selective autophagy, includ- 
ing in the autophagic targeting of genetically distinct viruses, in the 
autophagic targeting of mitochondria and, more speculatively, in the 
potential autophagic targeting of other cellular targets such as hepatic 
lipid droplets and endoplasmic reticulum. The mechanism by which 
SMURFI functions in selective autophagy is independent of its E3 
ubiquitin ligase activity, but rather involves its C2 membrane-targeting 
domain. We propose that the C2 domain of SMURF1 may participate 
in the delivery of selective autophagic substrates to the nascent autop- 
hagosome. Thus, SMURF1 has parallel functions in two distinct cel- 
lular degradation pathways, targeting specific proteins for degradation 
by the ubiquitin-proteasomal pathway” (via its E3 ubiquitin ligase 
activity) and targeting selective cargo for degradation by the autophagy 
pathway (via its C2 domain). 

Our findings in Smurf1~'~ MEFs and mice illustrate that our high- 
content image-based genome-wide screen successfully reveals novel 
candidate determinants of selective autophagy. More broadly, the 
identification of a set of 96 genes that may dually function in viral 
autophagy and mitophagy (but not in basal autophagy) suggests the 
existence of a common molecular network for targeting diverse 
unwanted cytoplasmic cargo to the lysosome. This network identifica- 
tion provides a basis for a more global understanding of the mechan- 
isms involved in selective autophagy. 


METHODS SUMMARY 

High-content image-based genome-wide siRNA screen. A genome-wide siRNA 
library (Dharmacon) containing 21,125 SMART pools was used for reverse trans- 
fection of HeLa/GFP-LC3 cells, followed by infection with SIN/mCherry.capsid 
virus, high content imaging using a Pathway855 automated microscope (BD 
Biosciences), quantitative image analysis, statistical analysis, and bioinformatic 
analysis as described in Supplementary Information. Primary hits were evaluated 
in three confirmation/secondary screens using the four individual siRNAs from 
each pool, including a screen for viral capsid/autophagosome colocalization, cell 
survival during SIN infection, and Parkin-induced mitophagy. 
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Functional analyses of SMURF1. See Supplementary Information. 
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The endonuclease activity of Mili fuels piRNA 
amplification that silences LINE] elements 


Serena De Fazio!, Nenad Bartonicek?*, Monica Di Giacomo!, Cei Abreu-Goodger’, Aditya Sankar’, Charlotta Funaya’, 
Claude Antony*, Pedro N. Moreira’, Anton J. Enright? & Donal O’Carroll' 


Piwi proteins and Piwi-interacting RNAs (piRNAs) have conserved 
functions in transposon silencing’. The murine Piwi proteins Mili 
and Miwi2 (also called Piwil2 and Piwil4, respectively) direct epi- 
genetic LINE1 and intracisternal A particle transposon silencing 
during genome reprogramming in the embryonic male germ 
line’*. Piwi proteins are proposed to be piRNA-guided endo- 
nucleases that initiate secondary piRNA biogenesis” ’; however, 
the actual contribution of their endonuclease activities to piRNA 
biogenesis and transposon silencing remain unknown. To investi- 
gate the role of Piwi-catalysed endonucleolytic activity, we engi- 
neered point mutations in mice that substitute the second aspartic 
acid to an alanine in the DDH catalytic triad of Mili and Miwi2, 
generating the Mili?” and Miwi2”“" alleles, respectively. Analysis 
of Mili-bound piRNAs from homozygous Mili?" fetal gonado- 
cytes revealed a failure of transposon piRNA amplification, result- 
ing in the marked reduction of piRNA bound within Miwi2 
ribonuclear particles. We find that Mili-mediated piRNA amplifica- 
tion is selectively required for LINE1, but not intracisternal A par- 
ticle, silencing. The defective piRNA pathway in Mili?“" mice 
results in spermatogenic failure and sterility. Surprisingly, homo- 
zygous Miwi2?4" mice are fertile, transposon silencing is estab- 
lished normally and no defects in secondary piRNA biogenesis are 
observed. In addition, the hallmarks of piRNA amplification are 
observed in Miwi2-deficient gonadocytes. We conclude that cycles 
of intra-Mili secondary piRNA biogenesis fuel piRNA amplification 
that is absolutely required for LINE] silencing. 

Transposable elements are mobile genetic elements that constitute a 
large fraction of eukaryotic genomes. The process of transposon silen- 
cing is of fundamental importance for genome integrity and germ cell 
development. Members of the Piwi subclade of the Argonaute proteins 
have conserved roles in transposon silencing and bind a class of small 
non-coding RNAs known as piRNAs that act as guides for targeting of 
the respective ribonuclear particles (RNPs)'. The Argonaute family is 
primarily defined by the presence of the Piwi domain that adopts a 
classical RNase H fold* with some Argonaute proteins being active 
small-RNA-guided endonucleases (slicers)’. Mechanisms of piRNA 
biogenesis are largely unclear, but two models summarize our current 
knowledge. First, primary processing of long single-stranded precursors 
by unknown nuclease(s) results in the generation of primary piRNAs. 
Thereafter, secondary biogenesis takes place via a ‘ping-pong’ cycle; 
that is, a feed-forward amplification loop wherein the slicer activities 
of two Piwi proteins take turns to generate secondary piRNAs using 
primary piRNAsas initial guides*®. Originally described in Drosophila, 
where all three Drosophila Piwi proteins have proven endonuclease 
activity in vitro®”°, this secondary pathway model offers an explana- 
tion for transposon silencing as piRNA biogenesis consumes trans- 
poson transcripts”®. Nevertheless, the role of Piwi-mediated slicing in 
secondary biogenesis or transposon silencing is not directly tested in 
any system. 


In mice, transposable elements are epigenetically silenced throughout 
most of life with transposon repression being initially established 
during germ cell development'’. The process of reprogramming in 
the fetal germ line initiates genome-wide CpG demethylation that 
erases both genomic imprints as well as retrotransposon promoter 
methylation’’. In male mice the Piwi proteins Mili and Miwi2 are 
essential for de novo DNA methylation and repression of (L1) and 
intracisternal A particle (IAP) retrotransposons**. The mouse 
piRNA model proposes that transposon-derived primary Mili-bound 
sense piRNAs initiate a round of secondary processing by targeting 
complementary antisense transposon transcripts for cleavage by Mili’s 
endonuclease activity’. The 3’ cleavage fragment is then 3’ processed, 
which results in the accumulation of secondary antisense piRNAs in 
Miwi2 that guide transposon DNA methylation. Furthermore, these 
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Figure 1 | The endonuclease activity of Mili is required for spermatogenesis 
and LI silencing. a, Testicular atrophy in Mili”“” mice. Testicular weights of 
3-month-old wild-type (WT) and Mili?“ mice are shown. b, Haematoxylin- 
and eosin-stained wild-type and Mili?“” testis sections from 3-month-old 
mice. The percentage of Mili?“ mice with the indicated phenotype is shown. 
c, Immunofluorescence using anti-L1 Orfl and anti-IAP Gag antibodies 
(green) and DAPI-stained DNA (blue) on wild-type and MiliP“" £16.5 fetal 
testis sections are shown. d, Western blot (WB) using anti-L1 Orfl and anti- 
tubulin antibodies on extracts from P10 wild-type and Mili”*" testes is shown. 
e, The expression levels of L1 and IAP were quantified by gRT-PCR from RNA 
derived from P10 wild-type and Mili”“” testes. Error bars indicate standard 
deviation from biological triplicates (n = 3). f, Methylation-sensitive Southern 
blot on Hpall-digested DNA extracted from P10 wild-type and Mili?“" testis 
using a L1 promoter probe. The arrowhead indicates the identity of the 
methylation-sensitive fragment. 
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antisense piRNA-programmed Miwi2 RNPs are also proposed to target 
transposon transcripts for cleavage followed by another round of 
secondary piRNA biogenesis that serves to amplify the primary trans- 
poson piRNAs’. Here we explore the contribution of the putative 
endonuclease activities of Mili and Miwi2 to the embryonic piRNA 
pathway and transposon silencing. 

We decided to use the same genetic strategy that has recently 
been used to explore the in vivo endonuclease function of Argonaute 
2 (Ago2)'*"*. Mutation of the second aspartic acid to an alanine in the 
catalytic triad DDH (the Ago2”° or Ago2”“" mutation/allele) abro- 
gates Ago2’s catalytic function without affecting protein stability and 
other functions of the protein in vivo’*'®. We first generated knock-in 
Mili?894/?8194 or Mili?“" mice (Supplementary Fig. la-c). The point 
mutation did not have an impact on Mili protein expression (Sup- 
plementary Fig. 1d). Mili localizes to the inter-mitochondrial cement, a 
part of the germ-cell-specific perinuclear nuage’’, and has been shown 
to interact with the Tudor domain protein Tdrd1 as well as the RNA 
helicases Mov10L1 and Mvh (refs 18-22). Co-localization of Mili with 
Tdrdl, Movl0L1 and Mvh was observed in fetal gonadocytes of 
wild-type and Mili?“ mice (Supplementary Fig. 2). Furthermore, 
interaction of Mili with Tdrd1 and Mov10L1 was confirmed in both 
wild-type and Mili?“" mice (Supplementary Table 1). In contrast to 
Mili, Tdrd1 or Mvh deficiency, normal inter-mitochondrial 
cement was observed in Mili?“" gonadocytes (Supplementary Fig. 3). 
Therefore, the Mili?“" point mutation does not have an impact on 


inter-mitochondrial cement integrity, expression levels or localization 
of Mili. 

Deletion of Mili or Miwi2 leads to arrest in meiotic prophase*”’, 
probably attributable to secondary consequences of defective trans- 
poson DNA methylation and de-repression. All Mili?“” testes were 
atrophic (Fig. la), with most (70%) of MiliP“" mice presenting the 
same meiotic phenotype as Mili ‘~ mice. Early pachytene was the 
most advanced stage of germ cell development observed in these mice 
(Fig. 1b and Supplementary Fig. 4). The other 30% of Mili?” mice 
presented a slightly milder phenotype. Tubules containing round and 
even elongating spermatids were observed, albeit with cells of aberrant 
morphology (Fig. 1b and Supplementary Fig. 5). All Mili?“ tubules 
were highly apoptotic (Supplementary Figs 4b and 5c). However, inde- 
pendent of the phenotype observed, all Mili?4" male mice are sterile 
(Supplementary Fig. le). The process of transposon de novo DNA 
methylation occurs during late gestation in mice, concluding in postnatal 
germ cells a few days after birth. De-repression of L1 elements was 
observed in Mili?“" embryonic day (E)16.5 gonadocytes and postnatal 
and adult germ cells (Fig. 1c-e and Supplementary Fig. 6a, b). With 
respect to the process of transposon silencing, no heterogeneity in adult 
Mili?" mice was observed (Supplementary Fig. 6), raising the 
possibility that the distinct adult spermatogenic phenotypes may arise 
from a genetic background effect or that Mili and its associated endo- 
nuclease activity may function in processes beyond transposon silen- 
cing in meiotic or post-meiotic cells. Notably, in contrast to Mili ‘~ 
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Figure 2 | piRNA amplification failure in Mili?“” mice. a, Mili RNPs were 
immunoprecipitated from E16.5 fetal testis of the indicated genotypes. 
Associated piRNAs were visualized by 5’ **P labelling, resolution on a 15% TBE 
urea gel and autoradiography. nt, nucleotide. b, Size profiles of cloned Mili- 
bound piRNAs from biological replicates of wild-type (blue) and Mili?P4¥ 
(pink) E16.5 fetal gonadocytes. c, Genomic annotation of cloned Mili-bound 
piRNAs as indicated from pairs of biological replicates of wild-type and 
MiliP4” £16.5 fetal gonadocytes. LTR, long terminal repeat; ncRNA, non- 
coding RNA. d, Mapping of Mili-bound piRNAs to the consensus of L1 (left) 
and IAP (right) elements. Positive and negative values indicate sense and 
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Distance between 5’ ends (nt) 


antisense piRNAs, respectively. Schematic representations of the respective 
elements are also shown (above). e, Percentage of piRNAs from wild-type and 
Mili?“" Mili RNPs with a U at the first position (1U) without A at position 10 
and an A at position 10 (10A) without a U at position 1 are shown for L1 and 
IAP elements. Error bars indicate standard error of the mean from the 
biological duplicates (n = 2). f, Ping-pong analysis of Mili-bound piRNAs from 
biological replicates of wild-type and Mili”“" E16.5 fetal gonadocytes. Relative 
frequency (y axis) of distances between 5’ ends (x axis) between complementary 
piRNAs for both L1 and IAP elements is shown. nt, nucleotides. 
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mice*“, IAPs are normally repressed in Mili?“” mice (Fig. 1c-e and 


Supplementary Fig. 6a, b). Accordingly, defective CpG DNA methyla- 
tion of L1 but not IAP elements was observed in postnatal germ cells at 
time points after the completion of the de novo DNA methylation 
process (Fig. 1f and Supplementary Fig. 6c). Collectively, these data 
indicate that the endonuclease activity of Mili is absolutely required for 
LI silencing. 

We next analysed Mili-bound piRNAs to understand the impact of 
the Mili?“" mutation on the piRNA pathway. Equal quantities of 
piRNA were observed in Mili RNPs from Mili?“" fetal gonadocytes 
(Fig. 2a). piRNA libraries from biological duplicates of Mili RNPs from 
wild-type and Mili?*" fetal gonadocytes were subjected to deep 
sequencing. 20.3% of piRNAs are longer in Mili?“"' versus wild-type 
gonadocytes (Fig. 2b). Among repetitive element piRNAs a 4.2- and 
5.1-fold reduction in the fraction of L1 and IAP piRNAs, respectively, 
are observed in Mili?" libraries (Fig. 2c). When we mapped the reads 
to the genome, no differences were observed in the genomic origin 
between wild-type and Mili?" piRNAs (Supplementary Fig. 7), indi- 
cating that the endonuclease activity of Mili is not required for the 
selective expression of specific piRNA clusters. We next performed a 
qualitative analysis of L1 and IAP piRNAs. For this analysis, we only 
consider unique reads and not their individual depths within libraries. 
No differences in L1 and IAP piRNA formation were observed in the 
wild-type and Mili>*” libraries (Supplementary Fig. 8a). We therefore 
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The precise endonucleolytic cleavage site of an Argonaute protein is 
at the nucleotide opposed to the tenth nucleotide from the 5’ end of the 
guide small-RNA**’’”. A 5'U (1U) bias is a characteristic of primary 
piRNAs, therefore when a primary piRNA directs cleavage of a target 
followed by 3’ processing, the secondary piRNA will contain a bias for 
A at position 10 (refs 5, 6). In addition, there is an overlap of 10 
nucleotides; this precise complementarity is detected when comparing 
distances between 5’ ends of complementary piRNAs and is known as 
the ping-pong signature*®. A second round of piRNA biogenesis 
fuelled by the secondary piRNA results in the amplification of the 
initiating primary piRNA. Analysis of Mili-bound piRNAs, now tak- 
ing depth into consideration, revealed reduced levels of both L1 and 
IAP piRNAs in Mili”*” fetal gonadocytes (Fig. 2c, d). The increase in 
1U and a decrease in 10A-containing piRNAs in Mili?" libraries are 
indicative of the failure of piRNA amplification (Fig. 2e). Notably, the 
ping-pong signature is lost for both L1 and IAP piRNAs in Mili?4” 
mice (Fig. 2f). Collectively, these data are consistent with a function of 
the endonuclease activity of Mili in the initiation of secondary proces- 
sing required for transposon piRNA amplification. 

The lack of secondary processing in Mili?“ gonadocytes resulted 
in the stark reduction of piRNA in Miwi2 RNPs (Fig. 3a). The addi- 
tional cycles required to generate libraries from Mili”“" gonadocytes 
supported the observation of a significant decrease in piRNAs within 
Miwi2 from Mili?*” gonadocytes (Supplementary Table 2). Miwi2 
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Figure 3 | Marked reduction of Miwi2-bound piRNAs in Mili?4” mice. 

a, Miwi2 RNPs were immunoprecipitated from E16.5 fetal testis of the 
indicated genotypes and piRNA represented as in Fig. 2a. b, Confocal 
projection images of indirect immunofluorescence with Tdrd9, Dcpla and 
Miwi2 antibodies (green) and DAPI-stained DNA (red) from E16.5 fetal testis 
of the indicated genotypes. c, Annotation of cloned Miwi2-bound piRNAs as 


indicated from biological replicates of wild-type and Mili?“” E16.5 fetal 
gonadocytes. d, Mapping of the Miwi2-bound piRNAs to the consensus of L1 
(left) and IAP (right) elements. Positive and negative values indicate sense and 
antisense piRNAs, respectively. Schematic representations of the respective 
elements are also shown. 
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Depla revealed intact piP-body formation in Mili gonadocytes 
(Fig. 3b). Furthermore, Miwi2 shows cytoplasmic foci and, consistent 
with the partial piRNA loading, retained nuclear localization in 
Mili?“" gonadocytes albeit with a lower staining intensity (Fig. 3b). 
Sequencing revealed a piRNA population of a normal size profile 
featuring an approximately 2-fold decrease in the percentage of both 
LINE- and IAP-associated piRNAs (Fig. 3c and Supplementary Fig. 
9a). However, when mapped to L1 and IAP consensus sequences, the 
Miwi2-bound piRNAs from Mili?" mice showed a pattern of piRNA 
formation equivalent to wild type (Fig. 3d and Supplementary Fig. 8b). 
Whereas Miwi2 bound sense and antisense transposon piRNAs in 
almost equal measure in both wild-type and Mili?“" gonadocytes 
(Supplementary Table 2), as previously described*’, the amplitude of 
the antisense piRNA peaks derived across L1 was diminished com- 
pared to those previously observed*’. This difference probably reflects 
experimental variation due to the antibody used or recent advances in 
library generation/sequencing. No difference in the genomic origin of 


Therefore, the endonuclease activity of Mili does not alter the identity 
of piRNAs that reside within Miwi2 RNPs, merely their quantity. 
The fact that Miwi2 is seeded with the appropriate but severely 
reduced quantities of piRNAs in Mili?" mice indicates that the 
potential endonuclease activity of Miwi2 cannot compensate for the 
loss of Mili’s slicer activity. To understand the contribution of Miwi2’s 
putative endonuclease activity to piRNA amplification and transposon 
silencing, we generated Miwi2?7!94/P7184 (Mfiwi2?4") and Miwi2 ‘— 
mice (Supplementary Figs 10 and 11). Surprisingly, Miwi2?4" mice 
are fertile with no defects observed in testis morphology (Fig. 4a). 
Furthermore, in contrast to Miwi2~'~ fetal gonadocytes**, both L1 
and IAP are normally repressed (Fig. 4b, c). In addition to transposon 
silencing, Miwi2 is specifically required for the maintenance of 
spermatogonial stem cells’. No loss of germ cells is observed in 
Miwi2? aged mice (9-12months) (Supplementary Fig. 12). 
Therefore, the Miwi2?“" mutation does not have an impact on any 
known physiological function of Miwi2. Accordingly, both Mili and 
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Figure 4 | Normal spermatogenesis and transposon silencing in Miwi2?4" 


mice. a, Haematoxylin- and eosin-stained wild-type, Miwi2?“" and Miwi2 ‘~ 
testis sections from 3-month-old mice. b, Immunofluorescence using anti-L1 
Orfl and anti-IAP Gag antibodies (green) and DAPI-stained DNA (blue) on 
wild-type, Miwi2?“" and Miwi2‘~ E16.5 fetal testis sections are shown. 

c, Methylation-sensitive Southern blot on Hpall-digested DNA extracted from 
P10 wild-type, Miwi2?4"' and Miwi2 '~ testis using a L1 promoter probe is 
shown. The arrow indicates the identity of the methylation-sensitive fragment. 
d, Mili (left) and Miwi2 (right) RNPs were immunoprecipitated from E16.5 
fetal testis of the indicated genotypes shown as in Fig. 2a. e, Mapping of the 
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Mili-bound (left) and Miwi2-bound (right) piRNAs to the consensus of L1. 
Positive and negative values indicate sense and antisense piRNAs, respectively. 
Schematic representation of L1 is shown (above). f, Ping-pong analysis of Mili- 
and Miwi2-bound piRNAs from biological replicates of wild-type and 
Miwi2?4" £16.5 fetal gonadocytes are shown. The frequency of the distance 
between 5’ ends of complementary piRNAs for L1 is presented as in Fig. 2e. 
g, Model. L1 element silencing is dependent upon Mili’s endonuclease activity 
for piRNA amplification. IAP silencing is dependent upon Mili and Miwi2 but 
independent of piRNA amplification. The small blue and green lines represent 
sense and antisense piRNAs, respectively. 
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(Fig. 4d). No qualitative or quantitative differences in piRNA origin or 
amplification were observed in Mili- or Miwi2-bound piRNAs from 
Miwi2?" fetal gonadocytes (Fig. 4e, fand Supplementary Figs 13-15). 
The lack of physiological and molecular phenotype in Miwi2?“" mice 
argues against the possibility that Miwi2-mediated cleavage events 
functionally contribute to either slicing of nuclear transposon tran- 
scripts or piRNA amplification necessary for the establishment of 
transposon silencing. In support of the latter conclusion, we find 
piRNA amplification as judged by the existence of a robust ping-pong 
signature in Mili-bound piRNAs from Miwi2~‘~ fetal gonadocytes 
(Supplementary Fig. 16). 

We propose a revised model of mammalian embryonic piRNA func- 
tion, whereby Mili’s endonuclease activity initiates secondary piRNA 
processing with an intra-Mili ping-pong cycle fuelling piRNA amp- 
lification (Fig. 4g). Mili’s endonuclease activity is specifically required 
for the expansion of L1 and IAP piRNAs within Mili RNPs and the 
normal accumulation of all classes of piRNAs within Miwi2 RNPs. The 
defective piRNA pathway in Mili?“ mice results in the failure to 
repress specifically Ll, revealing distinct silencing requirements for 
the respective transposons. The establishment of L1- but not IAP- 
silencing is strictly dependent upon Mili’s endonuclease activity and 
secondary piRNA biogenesis (Fig. 4g). The sufficiency of primary 
piRNA processing in Mili?" mice to direct Miwi2-mediated IAP 
repression illustrates fundamental differences in piRNA dosage 
required for silencing of the respective transposons. This difference 
may reflect their genomic burden—L1 occupies approximately 19% 
of the mouse genome whereas IAPs account for only 0.2% (ref. 28). 
Therefore, amplification of the L1 piRNA pool may be essential to 
program sufficient Miwi2 to target the prolific L1 element. In conclu- 
sion, we show that a single Piwi endonuclease supports piRNA amp- 
lification and distinct transposon silencing. It remains to be seen if these 
are conserved features of piRNA-mediated transposon silencing. 


METHODS SUMMARY 


Histology. Testes were fixed in Bouin’s fixative overnight at 4°C temperature, 
paraffin embedded and sectioned at 6-8-{1m thickness. Sections were then stained 
(haematoxylin and eosin or periodic acid Schiff) by routine methods. 
Immunofluorescence. For immunofluorescence, E16.5 fetal testes were freshly 
embedded in OCT, 6 jm sections cut and fixed in 4% paraformaldehyde. For Mili 
co-localization with Mvh, Tdrd1 and Mov10L1, sections were boiled twice for 5 min 
in 10mM pH6 sodium citrate solution for antigen retrieval. Sections were blocked 
for 30 min at room temperature in 10% normal donkey serum, 2% BSA and 0.1 M 
glycine. Primary antibody incubation was done overnight at 4°C in the blocking 
buffer. Anti-rabbit Alexa-488-conjugated (1:1,000; Invitrogen) and anti-mouse 
Alexa-546-conjugated (1:1,000; Invitrogen) antibodies were used as secondary 
antibodies. DAPI (3 pg ul?) (Sigma) was used to stain DNA. For Miwi2, L1 
Orfl and IAP staining, paraformaldehyde-fixed sections were permeabilized with 
0.1% Triton X-100 for 10 min and then blocked in TBS-T 10% normal donkey 
serum and processed as above. A Leica TCS SP5 confocal microscope was used to 
acquire images. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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Mouse strains. Mili’ allele: the Piwi domain of Mili is encoded in exons 20 to 
23. Mili aspartic acid 813, the second aspartic acid of the catalytic triad, is encoded 
in exon 21. To generate the Mili?*!*4 allele we replaced wild-type exon 21 with a 
mutant exon where the aspartic acid 813 codon is mutated to encode an alanine. 
A targeting construct was recombineered that contains homology arms and a 
frt flanked neomycin (neo) cassette 3’ of exon 21 that contains the Mili?874 
mutation. Southern blotting of the individual ES-cell-derived clone genomic 
Sacl-digested DNA with a 3’ external probe was used to identify homologous 
recombinants. A 9.2-kb DNA fragment corresponds to the wild-type Mili locus; 
integration of the neo’-frt flanked cassette 3’ of exon 21 introduces an additional 
Sacl site, thus decreasing the size of the SacI DNA fragment recognized to 8.0 kb in 
the targeted allele. Flp-mediated recombination and excision of the neo’-frt 
flanked cassette results in a 6-kb Sacl DNA fragment recognized by the external 
3’ probe, which is diagnostic of the Mili?” allele. 

Miwi2?" allele: the Piwi domain of Miwi2 is encoded in exons 14-19. 
Miwi2 aspartic acid 761, the second aspartic acid of the catalytic triad, is 
encoded in exon 17. To generate the Miwi2?7°! allele we replaced wild-type 
exon 17 with a mutant exon where the aspartic acid 761 codon is mutated to 
encode an alanine. A targeting construct was recombineered that contains 
homology arms and a frt flanked neomycin (neo) cassette 3’ of exon 17 that 
contains the Miwi2””°'* mutation. Southern blotting of the individual ES-cell- 
derived clone genomic BsrGI-digested DNA with an external 5’ probe was 
used to identify homologous recombinants. An 11.8-kb DNA fragment corre- 
sponds to the wild-type Miwi2 locus; integration of the neo’-frt flanked cassette 
3’ of exon 17 introduces an additional BsrGI site, thus decreasing the size of 
the BsrGI DNA fragment recognized to 8.4kb. Flp-mediated recombination 
and excision of the neo’-frt flanked cassette results in a 6.4-kb BsrG] DNA 
fragment recognized by the external 3’ probe, which is diagnostic of the 
Miwi2?"" allele. 

Miwi2 null allele: for the Miwi2~ allele, we flanked exon 17 with loxP sites. 
Cre-mediated deletion of exon 17 results in out-of-frame splicing between exon 
16 and 18, resulting in stop codons before the last exon, targeting the mutant 
transcript for nonsense-mediated decay”. Should a fraction of the mutant tran- 
script escape nonsense-mediated decay, a truncated protein would be made that 
lacks most of the Piwi domain and thus would probably be non-functional. To 
generate this allele, a targeting construct was generated that contains the same 
homology arms as the MiliP®!*4 construct, an frt flanked neo cassette with a loxP 
site 3’ of exon 17 anda second 5’ loxP site. Southern blotting of the individual ES- 
cell-derived clone genomic KpnI-digested DNA with an external 5’ probe was 
used to identify homologous recombinants. A 10.1-kb DNA fragment corre- 
sponds to the wild-type Miwi2 locus; integration of the neo’-frt2-loxP cassette 
3’ of exon 17 introduces an additional KpnI site, thus decreasing the size of the 
KpnI DNA fragment recognized to 8.0kb. Cre-mediated recombination and 
excision of exon 17 and neo” loxP flanked cassette results in a 5.3-kb KpnI 
DNA fragment recognized by the external 3’ probe, which is diagnostic of the 
Miwi2 null (Miwi2_ ) allele. 

The Miwi2 targeting constructs and the Mili targeting construct were electro- 
porated into IB10° and A9 ES cells, respectively. A9 ES cells are derived from 
hybrid embryos resulting from a 129/Sv male by C57BL/6 female cross (A. Wutz, 
manuscript in preparation). Southern blotting as described above of the individual 
ES-cell-derived clones was used to identify homologous recombinants. IB10- 
targeted ES cells were used to generate chimaeras for the respective Miwi2 targeted 
alleles by standard blastocyst injections. A9-targeted ES cells were injected into 
C57BL/6 8-cell-stage embryos for the generation of fully ES-cell-derived mice 
following a procedure similar to one described previously*’ but using a Piezo 
Impact Unit (PMM150FU, Prime Tech) rather than a laser to puncture the zona 
pellucida of the host embryo. The Mili?“ and Miwi2“"-targeted mice were then 
crossed to the FLP-expressing transgenic mice (FLPeR)* to remove the frt flanked 
neo’ cassette, resulting in the generation of Mili?“# and Miwi2?4# alleles, respec- 
tively. Mice heterozygous for the loxP flanked Miwi2-targeted allele were crossed 
to Deleter Cre*’ to generate the Miwi2 null (Miwi2_ ) allele. The mice analysed in 
this study were on a mixed C57BL/6 and 129 genetic background. 

All of the mice were bred and maintained in EMBL Mouse Biology Unit, 
Monterotondo in accordance with current Italian legislation (Art. 9, 27 January 
1992, number 116) under license from the Italian health ministry. 

Antibodies. Rabbit polyclonal antibodies against mouse Miwi2 were generated 
using the same epitope as described previously* and used for immunoprecipita- 
tion and immunofluorescence (1:200) experiments. A mouse monoclonal anti- 
body against Mili was obtained from R. Pillai and used for immunoprecipitation 
and immunofluorescence (1:1,000) experiments. The following antibodies were 
used at the indicated dilutions for immunofluorescence: anti-Orfl L1 (S. Martin; 
1:250), anti-IAP Gag (B. Cullen; 1:500), anti-Mvh (Abcam (ab13840); 1:200), 


anti-Mov10L1 (R Pillai; 1:200), anti-Tdrd1 (R. Pillai; 1:200), anti-Tdrd9 (S. 
Chuma; 1:200) and anti-Dcpla (J. Lykke-Andersen; 1:500). 
Histology. Testes were fixed in Bouin’s fixative overnight at 4°C temperature, 
paraffin embedded and sectioned at 6-8-|1m thickness. Sections were then stained 
(haematoxylin and eosin or periodic acid Schiff) by routine methods. 
Immunofluorescence. For immunofluorescence, E16.5 fetal testes were freshly 
embedded in OCT, 6 um sections cut and fixed in 4% paraformaldehyde. For Mili 
co-localization with Mvh, Tdrd1 and Mov10I, sections were boiled twice for 5 min 
in 10mM pH 6 sodium citrate solution for antigen retrieval. Sections were blocked 
for 30 min at room temperature in 10% normal donkey serum, 2% BSA and 0.1 M 
glycine. Primary antibody incubation was done overnight at 4 °C in the blocking 
buffer. Anti-rabbit Alexa-488-conjugated (1:,1000; Invitrogen) and anti-mouse 
Alexa-546-conjugated (1:1,000; Invitrogen) antibodies were used as secondary 
antibodies. DAPI (3 pg ul?) (Sigma) was used to stain DNA. For Miwi2, L1 
Orfl and IAP staining, paraformaldehyde-fixed sections were permeabilized with 
0.1% Triton X-100 for 10 min and then blocked in TBS-T 10% normal donkey 
serum and processed as above. 

A Leica TCS SP5 confocal microscope was used to acquire images. 
Detection of apoptotic cells. Detection of apoptotic cells was performed on 
paraformaldehyde-fixed paraffin-embedded testis section using the in situ cell 
death detection kit (Roche) and developed with DAB substrate (Roche). 
Sections were also stained with haematoxylin. 
Electron microscopy. Fetal testes were fixed in 2.5% glutaraldehyde in 50 mM 
cacodylate buffer supplemented with 2% sucrose, 50 mM KCl, 2.6 mM CaCl, and 
2.6mM MgCl, for 30 min at 4°C and rinsed in 50 mM cacodylate buffer. Samples 
were incubated in 2% osmium in 50 mM cacodylate buffer for 40 min on ice, rinsed 
in water and incubated in 0.5% uranylacetate in water for 30 min on ice. The contrast 
enhancement procedure was followed by a stepwise dehydration in ethanol, up to 
100% ethanol and infiltration in EPON (Roth) for embedding. Polymerization was 
done at 60 °C. Ultrathin sections (60-70-nm thickness) of the testis were obtained 
using an ultramicrotome (Leica Microsystems) and the sections were mounted on 
formvar-coated slot grids, and contrasted with uranylacetate and lead citrate. The 
sections were then viewed in a CM120 biotwin electron microscope (FEI) operating 
at 100 kV. Digital acquisitions were made with a Keen View CCD camera (Soft 
Imaging System). 
Germ cell isolation. The previously described Oct4-GFP (ref. 34) allele was 
crossed into Mili?*"" and Miwi2?4" mice to label germ cells with eGFP. To isolate 
E16.5 and postnatal day 7 germ cells, single cell suspensions of testis were obtained 
by two-step enzymatic digestion and GFP-positive FACS sorted. 
CpG methylation analysis. Methylation-sensitive Southern blotting and bisulphite 
methylation analysis were performed as described’. 
Mass spectrometry. Mili was immunoprecipitated as described’, resolved on gel 
and the entire lane with the exception of the immunoglobulin fragments was 
subjected to liquid chromatography coupled to tandem mass spectrometry (LC- 
MSMS) on a LTQ Orbitrap Velos (Thermo Fisher Scientific) instrument. 
RT-qPCR analysis. Total RNA was isolated from 10 day post-partum testis using 
Trizol according to the manufacturer’s instruction and treated with Turbo DNase 
RNase-free (Ambion). cDNA synthesis was performed with SuperScript III 
Reverse Transcriptase (Invitrogen) with Random Hexamers (Invitrogen). 
Quantitative PCR was carried out by using SYBR Green I Master mix (Roche) 
on LightCycler 480 system (Roche). Three animals for each genotype were 
examined and assays were always done in triplicate. Primers for qPCR were used 
as described’. 
Small RNA library generation. Mili and Miwi2 RNPs were immunoprecipitated 
as described'’. Small RNA libraries were generated as described previously*’ but 
using adaptors suitable for sequencing on the Ilumina platform. 
Data analysis. Sequencing data were processed from the FASTQ format and 
analysed using R/Bioconductor*®. Barcode sequences were resolved by sample 
with no mismatches allowed, 5'/3' adaptors were stripped, sequences were filtered 
for low-complexity regions and finally size-selected for reads between 24-30 nt. 
Processed reads for each sample were mapped against the mouse genome 
(NCBIm37) using Bowtie 0.12.5 (ref. 37) allowing for two mismatches and 
requesting all matching sites. For reads mapping to multiple distinct loci, only 
the first 100 were reported according to match score. Mapped reads were categorized 
according to genomic annotations from Ensembl** Mouse v58 (LTR, LINE, SINE, 
genic, non-coding RNA). Reads not mapping to any recorded genomic element 
were classed as ‘other’. Counts for reads mapping to multiple loci were divided by the 
total number of loci. In the specific cases of LINE and IAP elements, all reads were 
also mapped against representative canonical sequences obtained from GenBank” 
(M13002.1, EU183301.1) using Bowtie allowing for up to three mismatches. The 
number of repeat mapping reads was divided by repeat length and the number of 
genome mapping reads per billion processed reads. Genome-wide visualizations of 
read mapping to the mouse genome were obtained using Circos”’ plots and were 
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scaled between each set of four samples according to library size (see figure legend). 
Line-track y-axis maxima are set to one-tenth of the highest peak from the library 
with most reads. Heat-map minima were set to one-half the average read count 
across all bins and maxima to one-half the maximum read count across all bins. 
Heat-map scaling across colour-space was performed using the ‘scale_log_base = 5° 
parameter of Circos to sample colour space better. 

For ping-pong analysis, only reads mapping to repeat elements were considered. 
For each pair of sense/antisense overlapping reads, the distance between their 5’ 
ends was recorded and counts were represented as relative frequencies within 
samples for each repeat element. 
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Ascaris suum draft genome 
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Parasitic diseases have a devastating, long-term impact on human 
health, welfare and food production worldwide. More than two 
billion people are infected with geohelminths, including the round- 
worms Ascaris (common roundworm), Necator and Ancylostoma 
(hookworms), and Trichuris (whipworm), mainly in developing or 
impoverished nations of Asia, Africa and Latin America’. In 
humans, the diseases caused by these parasites result in about 
135,000 deaths annually, with a global burden comparable with that 
of malaria or tuberculosis in disability-adjusted life years’. Ascaris 
alone infects around 1.2 billion people and, in children, causes 
nutritional deficiency, impaired physical and cognitive develop- 
ment and, in severe cases, death’. Ascaris also causes major produc- 
tion losses in pigs owing to reduced growth, failure to thrive and 
mortality’. The Ascaris-swine model makes it possible to study the 
parasite, its relationship with the host, and ascariasis at the molecu- 
lar level. To enable such molecular studies, we report the 273 mega- 
base draft genome of Ascaris suum and compare it with other 
nematode genomes. This genome has low repeat content (4.4%) 
and encodes about 18,500 protein-coding genes. Notably, the 
A. suum secretome (about 750 molecules) is rich in peptidases 
linked to the penetration and degradation of host tissues, and an 
assemblage of molecules likely to modulate or evade host immune 
responses. This genome provides a comprehensive resource to the 
scientific community and underpins the development of new and 
urgently needed interventions (drugs, vaccines and diagnostic tests) 
against ascariasis and other nematodiases. 

We sequenced the A. suum genome at ~80-fold coverage (Sup- 
plementary Fig. 1), producing a final draft assembly of 272,782,664 
base pairs (bp) (N50 = 407 kilobases, kb; N90 = 80 kb; 1,618 contigs of 


Table 1 | Features of the Ascaris suum draft genome 


Estimated genome size in megabases 309 
Total number of base pairs within assembled scaffolds 272,782,664 
N50 length in bp; total number >2 kb in length 407,899; 1,618 
N90 length in bp; total number >N90 length 80,017; 748 
GC content of whole genome (%) 379 
Repetitive sequences (%) 44 
Proportion of genome that is coding (exonic; including 5.9; 44.2 
introns) (%) 

Number of putative coding genes 18,542 
Gene size (mean bp) 6,536 
Average coding domain length (mean bp) 983 


Average exon number per gene (mean) 6 
Gene exon length (mean bp) 

Gene intron length (mean bp) 
GC content in coding regions (%) 45 
Number of transfer RNAs 


N50 means 50% of all nucleotides in the assembly are within contigs of =408 kb. N90 means 90% of all 
nucleotides in the assembly are within contigs of =80 kb. Genome size estimated on the basis of k-mer 
(see online-only Methods) frequency. 


>2kb) (Table 1) with a mean GC-content of 37.9%. This genome has 
few repetitive sequences (about 4.4% of the total assembly) relative to that 
reported for other metazoan genomes sequenced to date® °, probably as 
a result of chromatin diminution’. We identified 424 distinct retro- 
transposon sequences (see Supplementary Tables 1-3) representing at 
least 22 families (8 long terminal repeats (LTRs), 12 long interspersed 
elements (LINEs) and 2 short interspersed elements (SINEs)), with 
Gypsy, Pao and Copia classes predominating for LTRs (n = 97, 85 
and 60, respectively) and CRI, L1, and reverse transcriptase encoding 
RTE-RTE classes predominating for non-LTRs (n = 29, 28 and 21, 
respectively). We also identified eight families of DNA transposons 
(91 distinct sequences in total), of which MuDr, En-Spm and 
Merlin (n = 12, 9 and 8, respectively) predominated. We predicted 
18,542 genes (14,783 supported by transcriptomic data), with a mean 
total length of 6.5 kb, exon length of 153 bp and a mean of 6.4 exons per 
gene (see Supplementary Fig. 2). Compared with the nematodes 
(roundworms) Caenorhabditis elegans*, Pristionchus pacificus’, 
Brugia malayi? or Meloidogyne hapla"®, overall, the A. suum genes are 
significantly longer (see Supplementary Table 2), relating primarily to 
expansions of intronic regions (mean 1.1 kb). 

Most (78.2%) of the predicted A. suum genes (Fig. 1) have a homo- 
logue (BLASTp cut-off <10°) either in C. elegans (n= 12,779; 
68.9%), B.malayi (12,853; 69.3%), M.hapla (10,482; 56.5%) or 
P. pacificus (11,865; 64.0%), with 8,967 being homologous among all 
species examined, and 4,042 (21.8%) being ‘unique’ to A. suum (see 
Fig. 1). Of the genes with homology to C. elegans or B. malayi, ~50% 


Ascaris suum 
animal parasite 


Brugia malayi 
animal parasite 


Caenorhabditis elegans 
free-living 


Pristionchus pacificus  . 
free-living 


Meloidogyne hapla 
plant parasite 


Figure 1 | Venn diagram summarizing the overlapping homology between 
the Ascaris suum gene set and those of other nematodes. Grey box (right) 
represents genes unique to A. suum, relative to Brugia malayi (red circle), 
Caenorhabditis elegans (blue circle), Meloidogyne hapla (green arc) and/or 
Pristionchus pacificus (yellow circle). The phylogram (left) displays the 
evolutionary relationships currently proposed among the nematodes. 
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and 44%, respectively, were determined to represent one-to-one ortho- 
logues'' (see Supplementary Data 1). For these orthologues (on scaf- 
folds exceeding one megabase, 1 Mb, in size), we explored synteny for 
A. suum and B. malayi by pairwise comparison with C. elegans (see 
Supplementary Data 1). The findings show that interchromosomal 
gene rearrangments in A.suum are relatively rare and occurred 
less frequently in A. suum than in B. malay? relative to C. elegans 
since their evolutionary divergence’. In contrast, intrachromosomal 
rearrangements were relatively common and comparable in frequency 
to those inferred for B. malayi?. Overall synteny was significantly 
higher between A. suum and B. malayi (~15%) than between either 
species and C. elegans (~3%), which is consistent with current 
knowledge of the evolutionary relationships among these three 
species’’. Interestingly, of these C. elegans orthologous genes, 532 and 
483 were exclusive to the current assemblies of the A. suum and 
B.malayi genomes, respectively (Supplementary Data 2). Although 
there were no homology matches between these two exclusive subsets 
of orthologues, they shared striking similarity in functional ontology 
(biological process), being linked predominantly to growth, reproduc- 
tion, development and/or morphogenesis. There is clear evidence of 
plasticity in the germline of metazoans”, with cases of products from 
non-homologous genes in different species having analogous func- 
tion(s). Therefore, we hypothesize that these two unique gene subsets 
relate to differences in reproductive biology (oviparity versus viviparity) 
and life history (direct versus indirect) between A. suum and B. malayi. 
Clearly, this proposal warrants testing and functional validation in 
C. elegans and/or in Ascaris. 

Of the entire A. suum gene set, 2,370 genes had an orthologue 
(BLASTp cut-off <10° °) belonging to one of 279 known biological 
(KEGG; see online-only Methods) pathways (Supplementary Data 3). 
Mapping to pathways in C. elegans indicated a full complement of 
molecules; by inference, the vast majority (95%) of the A. suum 
euchromatin is represented in the present genomic assembly, an infer- 
ence that is supported by our transcriptomic data (Supplementary 
Tables 4 and 5). We were able to assign possible functions (such as 
for enzymes, receptors, channels and transporters; Supplementary Fig. 
3, Supplementary Table 6 and Supplementary Data 4) to 13,503 
(72.8%) of the genes predicted for A. suum (Fig. 2). For these genes, 
we predicted 456 peptidases belonging to five major classes (aspartic, 
cysteine, metallo-, serine and threonine), with the metallo- (n = 184: 


20,000 
GTPases 
18,000 Phosphatases 
= Peptidases 
46,000 = Kinases 
= Other enzymes 
14,000 = Cytokines and ligands 
= Glycans and proteoglycans 
a Other membrane proteins 
ee Other receptors 
8 anes = GPCRs 
=) = Other transporters 
8,000 = Channels/pores 
Ribosome-associated proteins 
6,000 Transcription factors 
= DNA replication and repair 
4,000 = Folding and chaperone 
= Chromosome-associated proteins 
2,000 = Structural proteins 
= Hypothetical 
0 a Other 


Figure 2 | The major protein classes representing the Ascaris suum gene set. 
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41.0%) and serine proteases (n = 132: 30.0%) predominating (Sup- 
plementary Data 4). Notably, the secreted peptidases (such as the 
M12 ‘astacins’, the S9 and S33 serine proteases, and the Cl and C2 
cysteine proteases) are abundantly represented, and have key roles in 
tissue invasion and degradation (for example, during migration and/or 
feeding) and/or immune evasion/modulation in many parasites'*””. 

In addition, we identified 609 kinases and 257 phosphatases, 
respectively (Supplementary Data 4). All major classes of kinases are 
represented, with the tyrosine (TK: n = 94), casein (CK1: n= 83), 
CMGC (n=67) and CAMK (n=54) being most abundant in 
A. suum. The phosphatome includes 17 receptor and 68 conventional 
tyrosine, 64 serine/threonine and 39 dual-specificity phosphatases. On 
the basis of homology with molecules in C. elegans, 169 GTPases are 
encoded in the A. suum genome, including 135 small GTPases (Ras 
superfamily) representing the Rab (n = 36), Ras (n = 35; plus 8 Ras- 
like), Rho (1 =17; plus 9 Rho-like) or Ran (n= 6) subfamilies. 
Examples of these homologues include eft-1, fzo-1, glo-1 and rho-1, 
which have essential roles in embryonic, larval and/or reproductive 
development (see www.wormbase.org). 

Given their key roles, many of these enzymes are proposed as targets 
for anti-parasitic compounds and/or vaccines'*"*, Equally, the range of 
receptor and channel proteins identified here are interesting because 
many common anthelmintics bind such targets'’. Here, we predicted 
279 G protein-coupled receptors (GPCRs) for A. suum and 477 channel 
or pore proteins (Supplementary Data 4), including 272 voltage-gated 
and 98 ligand-gated ion channels. Many voltage-gated ion channels are 
known targets for nematocidal drugs, such as macrocyclic lactones (for 
example, ivermectin) and levamisole, and an aminoacetonitrile deriv- 
ative, monepantel, is the most recent example of a highly effective 
nematocide that binds to a ligand-gated ion channel’’. Importantly, 
in the A.suum gene set, we found a homologue (acr-23) of the 
C. elegans monepantel receptor’, suggesting that this drug may kill 
A. suum. In addition, we detected 462 transporters (for example, small 
molecule porter proteins), of which the major facilitator (n = 155), 
cation symporter (n= 71) and resistance-nodulation-cell division 
(n = 56) superfamilies were most abundant (Supplementary Data 4). 

Excretory/secretory (E/S) peptides are central to understanding 
parasite-host interactions. We predicted the secretome of A. suum 
to comprise 775 proteins with diverse functions (Supplementary 
Data 5). Notable among them are 68 secreted proteases, including 
20 SC clan serine proteases (S9 and S33 families), 18 MA clan 
metallo-proteases (M10, M12 and M41 families) and 5 CA/CD clan 
cysteine proteases (C1 and C13 families); see http://merops.sanger. 
ac.uk/ for clan definitions. 

Secreted proteases have known roles in host-tissue degradation, 
required for feeding, tissue-penetration and/or larval migration for a 
range of helminths", including Ascaris”. In addition, they are involved 
in inducing and modulating host immune responses against parasitic 
helminths’, which are often Th2-biased”°. From the current under- 
standing of these responses’*, we compiled a comprehensive list of 
A. suum E/S proteins homologous to helminth-secreted peptides with 
important immunogenic or immunomodulatory roles in host animals 
(Supplementary Table 7 and Supplementary Data 6). Such homolo- 
gues represent about half of the predicted A. suum secretome. Most 
abundant among them are O-linked glycosylated proteins (n = 300), 
many of which are heavily targeted by immunoglobulin (Ig) M anti- 
bodies and bound by various pattern recognition receptors associated 
with host dendritic cells responsible for the induction of a Th2 
immune response”. 

Other members of the A. swum secretome are predicted to direct 
or evade immune responses. These peptides include a close homologue 
of the E/S-62 leucyl aminopeptidase of the filarioid nematode 
Acanthocheilonema viteae, which has been shown to inhibit B-cell, 
T-cell and mast cell proliferation/responses, promote an alternative 
activation of the host macrophages, through the inhibition of the Toll- 
like receptor signalling pathway, and induce a Th2 response through 
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the inhibition of IL-12p70 production by dendritic cells'*. Additional, 
immunomodulatory molecules predicted for A.suum (BLASTp 
cut-off =10 °) include homologues of another B-cell inhibitor (that 
is, the B. malayi cystatin CPI-2), several TGF-B and macrophage 
initiation factor mimics, numerous neutrophil inhibitors, various 
oxidoreductases, and five close homologues of platelet anti-inflammatory 
factor % (ref. 15). Some A. suum E/S peptides are predicted to be 
involved in immune evasion; for instance, some mask parasite antigens 
by mimicking host molecules (such as several C-type lectins with close 
homology to vertebrate macrophage mannose or CD23 (low affinity 
IgE receptors’). 

Taken together, these data indicate that A. suum has a large arsenal of 
E/S proteins that are likely to be involved directly in manipulating, block- 
ing and/or evading immune responses in the host. Understanding the 
immunomolecular interplay between A. suum and its host, early in 
infection, particularly during hepatopulmonary migration, should pave 
the way for designing prophylactic interventions, such as vaccination. 

Ascaris larvae undertake an extensive migration through their host’s 
body before they establish as adults in the small intestine. Following the 
ingestion of infective eggs and their gastric passage, third-stage larvae 
(L3s)”' hatch from eggs in the gut and penetrate the intestinal wall; they 
then undergo, via the bloodstream, an arduous hepatopulmonary 
migration. The complexity of this migration coincides with important 
developmental changes in the nematode’. Clearly, this migration 
requires tightly regulated transcriptional changes in the parasite. We 
explored this aspect by characterizing the transcription profiles of 
infective L3s (from eggs), L3s from the liver or lungs of the host, and 
fourth-stage larvae (L4s) from the small intestine (Supplementary 
Fig. 4, Supplementary Data 7). Notable among genes enriched during 
larval migration are various secreted peptidases linked to tissue- 
penetration and degradation during feeding and/or migration”, 
including three C1/C2, five M1, eight M12, fourteen S9 and five $33 
clan members. Considering the complex nature of larval migration, a 
key role for molecules associated with chemosensory pathways is 
highly likely. Such molecules have been studied extensively in 
C. elegans”, with numerous homologues being identified here in larval 
transcripts (Supplementary Data 7). With few exceptions, all of these 
homologues relate to olfactory chemosensation of volatile compounds 
(for example, alcohols, aldehydes or ketones), suggesting that the 
olfactory detection of molecular gradients is central to the navigation 
of A. suum larvae during migration. Lastly, considering the substantial 
host attack against migrating Ascaris larvae, E/S proteins probably 
play crucial roles in immune modulation and/or evasion during 
hepatopulmonary migration. Many such genes, including Bm-alt-1, 
Bm-cpi-2 and mif-4, are highly transcribed in A. suum larvae (see 
Supplementary Data 7), particularly in migrating L3s. 

Because of the large size of the adult nematode (10-15 cm), we were 
able to explore transcription in the musculature and reproductive 
tracts of adult male and female A. swum individuals as well (Sup- 
plementary Fig. 5 and Supplementary Data 8). Among the male- 
enriched transcripts is a range of genes associated specifically with 
sperm and/or spermatogenesis, including fer-1, spe-4, spe-6, spe-9, 
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spe-10, spe-15 and spe-41, alg-4 and msp-57 (see www.wormbase.org). 
Notable among the female-enriched transcripts is a large variety of 
genes associated with oogenesis/egg-laying (such as cat-1, unc-54, 
cbd-1 and pqn-74), vulval development (such as noah-1, nhr-25, 
cog-1 and pax-3) and/or embryogenesis (such as cam-1 and unc-6; 
see www.wormbase.org). Although the functions of these genes have 
been explored in C. elegans (primarily a hermaphroditic nematode), 
this detailed insight into the tissue-specific transcription for a 
dioecious nematode is a major advance. 

Analyses of these RNA-seq data revealed 163,777 single nucleotide 
polymorphisms (SNPs) in coding regions of the A. suum genome; 61% 
of them were synonymous, 7% non-synonymous and <0.1% termina- 
tion codons (Supplementary Data 9). Some of the most variable genes 
in A. suum encoded ribosomal proteins (n = 44), translation initiation 
factor (tif) eIF-3 subunits 3 and 5, tif TFITH subunit H2 and tif IF-2, 
galectin-4 and galectin-9, the latter two of which are probably linked to 
immune evasion’* and may indicate that antigenic variation is among 
the many strategies apparent in Ascaris to combat the host immune 
response. Interestingly, the high nucleotide variability linked to the 
key elements of translation machinery did not relate to a bias in syn- 
onymous SNPs, suggesting that many mutations accumulate in 
particular ‘hotspots’ and/or are tolerated, but do not compromise 
either the structure or the function of this machinery. The least variable 
genes encoded various (druggable)'*”” serine/threonine phosphatases 
(n = 17) as well as numerous receptors, channels and transporters, 
for which there was an unusually strong bias towards synonymous 
SNPs, reinforcing their potential as intervention targets. 

Given our present reliance on a small number of drugs (for example, 
piperazine, pyrantel, albendazole and mebendazole) for the treat- 
ment of ascariasis, their repeated or excessive use might lead to resist- 
ance in Ascaris populations to some or all of these compounds”. As 
few new anthelmintics (that is, aminoacetylnitriles’? and cycloocto- 
depsipeptides™) have been discovered in the past two decades using 
traditional screening methods, an effective, alternative means of drug 
discovery is urgently needed”’. Genome-guided drug target or drug 
discovery has major potential to complement conventional screening 
and re-purposing. The goal of genome-guided analysis is to identify 
genes or molecules whose inactivation by one or more drugs will 
selectively kill parasites but not harm their host. 

Because most parasitic nematodes are difficult to produce or maintain 
outside of their host, or to subject to gene-specific silencing by RNAi® or 
morpholinos”*”’, direct functional assessment of essentiality (that is, they 
are needed for the nematode’s survival) is not yet practical. However, 
essentiality can be inferred from functional information for model 
organisms (for example, lethality in C. elegans and D. melanogaster)”, 
and this approach has indeed yielded effective targets for nematocides’®. 
In Ascaris, we identified 629 proteins (Supplementary Data 10) with 
essential homologues in C. elegans and D. melanogaster (linked to lethal 
phenotypes upon gene perturbation). Among these are 87 channels or 
transporters (including 44 voltage-gated ion channels), which represent 
protein classes most successfully targeted for anthelmintic compounds, 
including macrocyclic lactones, levamisoles and aminoacetonitrile 


Protein or chokepoint Subtype (number of molecules) Total number 
GTPase Small GTPase (22); Ras (13); Rab (5); Rho (3); Ras-like (1); others (2) 46 
Kinase TK (8); AGC (3); CAMK (2); TKL (2); STE (1); other (1) 7 
Peptidase A22A (5); M14B (3); M12B (2); M67A (1); C14A (1); C50 (1); M12A (1); M13 (1); TO1A (1); C46 (1); S33 (1) 9 
Phosphatase STP (28); cPTP (4); DSP (3) 35 


Transporters and channels 


‘Lethal’ chokepoints 
G protein-coupled receptor kinase 5 
Phosphoribosylformylglycinamidine synthase 
Inosine-5'-monophosphate dehydrogenase 


Phospho-N-acetylmuramoyl pentapeptide transferase 


Channels and pores (30); primary active transporters (24); incompletely characterized transport system proteins (22); 87 
accessory factors involved in transport (5); electrochemical potential-driven transporters (5); group translocators (1) 
CDP-diacylglycerol-inositol 3-phosphatidyltransferase 


Candidates were inferred from essentiality prediction and metabolic chokepoint analysis. 
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derivatives’’’*. Also notable are 46 GTPases, 35 phosphatases (includ- 
ing PP1 and PP2A homologues, as targets for norcantharidin 
analogues)'®, 17 kinases and 19 peptidases (Table 2). 

In addition to essentiality-based prediction, an alternative strategy 
has been to infer enzymatic chokepoints intrinsic to the complete 
metabolome of a parasite’. Such chokepoints are defined as enzymatic 
reactions that uniquely produce and/or consume a molecular com- 
pound, using the strategy that the disruption of such enzymes would 
lead to the toxic build-up (that is, for unique substrates) or starvation 
(that is, for unique products) of metabolites within cells. Pathway 
analysis identified 225 likely chokepoints linked to genes predicted 
to be essential in A. suum (Supplementary Data 10). We gave the highest 
priority to targets predicted from single-copy genes in the A. suum 
genome, reasoning that lower allelic variability would exist within 
populations and would thus be less likely to give rise to drug resistance. 

Using this strategy, we identified five high-priority drug targets for 
A. suum (see Table 2 and Supplementary Data 10) that, given their 
conservation with C. elegans and D. melanogaster, are likely to be rel- 
evant in relation to many other parasitic helminths. Conspicuous 
among them is IMP dehydrogenase (GMP reductase), which has a 
variety of inhibitors (for example, mycophenolic acid analogues”’) that 
could be tested for ascaricidal effects. Clearly, the druggable genome of 
Ascaris now provides a solid basis for rational drug design, aimed at 
controlling parasitic nematodes of major socioeconomic impact 
worldwide. 

In conclusion, we have characterized the genome of A. suum, a 
major parasite of one of the world’s most important food animals 
(pig) and the closest relative of A. lumbricoides, which infects about 
1.2 billion people globally’. Intriguingly, the present A. suum draft 
genome exhibits unusually low repeat content and lacks Tas2 trans- 
posons’. These characteristics probably relate to the chromatin dimi- 
nution described previously for some ascaridoids’, indicating that our 
assembly represents the somatic genome of this parasite. The precise 
mechanism governing this diminution is not yet understood. 
Although the chromatin lost during this process is not fully character- 
ized, there appears to be a significant loss in repeat content’, consistent 
with the present assembly. Notably, the present gene set inferred for 
A. suum includes fert-1 and rpS19G, which, although originally pro- 
posed to be germline-specific’, were transcribed in all adult libraries 
sequenced here. This finding suggests that the genomic content lost 
during diminution might vary among individuals or tissues, and is a 
stimulus to investigate chromatin diminution between and among 
individual cells (that is, sperm or eggs), stages and tissue types of 
A. suum. Importantly, the present study, showing that a high-quality 
genomic assembly can be achieved using an approach based on whole- 
genome amplification, provides unique prospects for exploring dimi- 
nution in detail, using the present genome as a reference. 

In addition, our sequencing effort has characterized a broad range of 
key classes of molecules of major relevance to understanding the 
molecular biology of A. suum and the exquisite complexities of the 
host-parasite interplay on an immunobiological level. This work paves 
the way for future fundamental molecular explorations and the design 
of new methods for the treatment and control of one of the world’s most 
important parasitic nematodes. This focus is now crucial, given the 
major impact of Ascaris and other soil-transmitted helminths, which 
affect billions of people and animals worldwide. Although these para- 
sites are seriously neglected, genomic and post-genomic approaches 
provide new hope for the discovery of intervention strategies, with 
major implications for improving global health. 


METHODS SUMMARY 


We sequenced the genome of A. suum using Illumina technology from genomic 
DNA from the reproductive tract of a single adult female. From six paired-end 
sequencing libraries (insert sizes: 0.17 kb to 10 kb; see Supplementary Tables 1 and 
2), we generated 39 Gb of useable short-read sequence data, equating to ~80-fold 
coverage of the 273-Mb genome. We assembled the short reads, constructed 
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scaffolds in a step-by-step manner, and then closed intra-scaffold gaps’. 
Transposable elements, non-coding RNAs and the protein-coding gene set were 
inferred using a combination of predictive modelling and a homology-based 
approach. Orthology and synteny analyses were conducted using established 
methods’''. We sequenced messenger RNA from infective L3s (from eggs), 
migrating L3s from the liver or lungs of the host, and L4s from the small intestine, 
as well as muscle and reproductive tissues from adult male and female A. suum, 
and used these data to aid gene predictions, define SNPs and explore key molecules 
associated with larval migration, reproduction and development. All proteins 
predicted from the gene set were annotated using databases for conserved protein 
domains, gene ontology annotations and model organisms (that is, Caenorhabditis 
elegans, Drosophila melanogaster and Mus musculus). Essentiality and drug target 
predictions were conducted using established or in-house methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Sample procurement, preparation and storage. All specimens of A. suum were 
collected from pigs (Sus scofra) with naturally acquired infections in Victoria, 
Australia (adult nematodes) and Ghent, Belgium (larval stages). L3s and L4s were 
also collected from the liver or lung and from the small intestine, respectively, of 
pigs, using established procedures*!**. Nematodes were washed extensively in 
sterile physiological saline (37 °C), snap-frozen in liquid nitrogen and then stored 
at —70°C until use. 

DNA isolation, sequencing and quality control. Total genomic DNA was iso- 
lated from the reproductive tract of a single adult female of A. suum using a 
sodium-dodecyl sulphate/proteinase K digestion*’ followed by phenol-chlo- 
roform extraction and ethanol precipitation**. Total DNA yield was determined 
using the Qubit fluorometer double-stranded DNA HS Kit (Invitrogen). DNA 
integrity was verified with a 2100 Bioanalyser (Agilent). Short-insert (170 bp and 
500 bp) and mate-pair (800 bp, 2 kb, 5 kb and 10 kb) genomic DNA libraries were 
prepared and paired-end sequenced using TruSeq chemistry on a HiSeq 2000 
(Illumina). Whole-genome amplification, employing the REPLI-g Midi Kit 
(Qiagen), was used to produce (from 200 ng of genomic template) the required 
amount of DNA for the construction of the 2-kb, 5-kb and 10-kb libraries 
(Supplementary Fig. 6). The sequence data generated from each of the six lib- 
raries were verified, and low-quality sequences, base-calling duplicates and adap- 
ters removed. The size of the genome and the heterozygosity rate were estimated 
by establishing the frequency of occurrence of each 17-bp k-mer (a unique 
sequence of k (that is, 17) nucleotides in length) within the genomic sequence 
data set (from the 170-bp library) using an established method’. Genome size was 
estimated using a modification of the Lander-Waterman algorithm’, where the 
haploid genome length in base pairs is G = (N X (L — K + 1) — B)/D, where N is 
the read length sequenced in base pairs, L is the mean length of sequence reads, K 
is the k-mer length (17 bp) and B is the number of k-mers occurring less than four 
times (Supplementary Fig. 7). Heterozygosity was evaluated throughout the 
genome assembly by assessing the distribution of the k-mer frequency in the 
sequence data set. 

RNA isolation, sequencing and assembly. We obtained total RNAs from egg-L3s 
(n ~ 500,000), liver-L3s (n ~ 60,000), lung-L3s (1 ~ 80,000) or L4s (n ~ 30,000) 
and from the somatic musculature or reproductive tract of each of two adult male 
and two adult female A. suum using the TriPure reagent (Roche), and both yield 
and quality were verified by 2100 BioAnalyser (Agilent). Polyadenylated 
(polyA+) RNA was purified from 10 1g of total RNA using Sera-mag oligo(dT) 
beads, fragmented to a size of 300-500 bp, reverse-transcribed using random 
hexamers, end-repaired and adaptor-ligated, according to the manufacturer’s 
protocol (Illumina). Ligated products of ~400bp were excised from agarose 
and then PCR-amplified (15 cycles), as recommended. Products were purified 
over a MinElute column (Qiagen) and subjected to paired-end RNA-seq using 
TruSeq chemistry on a HiSeq 2000 (Illumina) and assessed for quality and adaptor 
sequence. Transcripts were assembled from RNA-seq data using Oases**. All 
transcripts were used to assess the completeness of the genome assembly and to 
predict genes. 

Genomic assembly and quality control. Following sequencing, all DNA- 
sequence reads were corrected based on k-mer (=17) distribution’. Briefly, 
sequence reads were removed if >10% of bases were ambiguous (represented 
by the letter N) or multiple adenosine monophosphates (poly-A), and all remain- 
ing reads were filtered on the basis of Phred quality. For small insert-size libraries 
(that is, <800 bp), additional reads were removed from the final data set if >65% 
of bases were of a low Phred quality (<8). For large insert libraries (2 kb, 5 kb and 
10 kb), reads were removed from the final data set if >80% of bases were of a low 
Phred quality (<8). Duplicate (that is, identical) reads and partial reads represent- 
ing the Illumina adaptor sequence were also removed, as were reads from the 
500-bp library representing paired reads found to overlap by >10 bp (allowing for 
a 10% mismatch). Corrected and filtered data were assembled into contigs using 
SOAPdenovo’, and joined iteratively into scaffolds using a step-wise process (see 
Supplementary Fig. 8), using the paired reads generated from each library; local 
assemblies were used to close all gaps. Each nucleotide position in the final assembly 
was assessed for accuracy by aligning all filtered reads to the scaffolds using 
SOA P2aligner”, allowing for up to five mismatches per read. The depth of coverage 
and repeat content were assessed initially by sliding-window analysis and presented 
as a frequency distribution (Supplementary Fig. 9). GC-content was estimated using 
10-kb non-overlapping sliding windows, and GC-bias’* was assessed based on a 
frequency distribution of these data (Supplementary Fig. 10). To assess the com- 
pleteness of the genome assembly, RNA-seq data representing each of the organs 
(that is, musculature and reproductive tract), genders and/or stages of A. suum 
sequenced were mapped to the final assembly using the BLAST-like Alignment 
Tool (BLAT)”. 


Assessment of repeat content and annotation of non-coding RNA. Following 
genome assembly, tandem repeats were identified using the Tandem Repeats 
Finder program”. Transposable elements were predicted using a combination 
of homology-based comparisons (using RepeatMasker*’) and de novo approaches 
(using LTR_FINDER”, PILER” and RepeatScout™), with a consensus population 
of predicted repetitive elements, constructed in RepeatScout using fit-preferred 
alignment scores. Low-frequency repeats (=25) and multi-copy genes (in the 
repeat element library) were filtered using RepeatMasker, producing a non- 
redundant sequence file, which was then used to identify and classify additional 
homologous repeats in the genome. 

Gene prediction, and synteny and genetic variation analysis. The A. suum 
protein-coding gene set was inferred using de novo-, homology- and evidence- 
based (that is, transcriptomic) approaches. De novo gene prediction was performed 
ona repeat-masked genome using three programs (Augustus, GlimmerHMM and 
SNAP)’; training models were generated from a subset of the transcriptomic data 
set representing 1,355 distinct genes. Homology-based prediction was conducted 
by comparison with complete genomic data for Caenorhabditis elegans’, 
Pristionchus pacificus* and Brugia malayi? using a multi-phase strategy, in which 
(1) all putative homologous gene sequences were preliminarily identified from 
alignments with protein sequences representing the complete gene set of each of 
the reference genomes (the longest transcripts were chosen to represent each gene) 
by TblastN (e-value cut-off: 10-°) and grouped into gene-like structures using 
genBlastA”; (2) regions representing these putative genes, and flanking regions 
(3,000 bp) at the 5’- and 3'-ends of each predicted gene, were extracted from the 
assembly and aligned to the ‘parent’ sequences derived from the reference genomes 
using Genewise”; (3) all single-exon genes predicted to have arisen from a retro- 
transposition and containing at least one frame-shift error or representing incom- 
plete coding domains of <150 bp as well as all multi-exon genes containing more 
than two frame-shift errors and/or representing incomplete coding domains of 
<100 bp, were discarded. Evidence-based gene prediction was conducted by align- 
ing all RNA-seq data generated herein against the assembled genome using 
TopHat"’, with cDNAs predicted from the resultant data using Cufflinks’. 
Following the prediction of genes, a non-redundant gene set representing homo- 
logy-based, de-novo-predicted and RNA-seq-supported genes, was generated 
using Glean (http://sourceforge.net/projects/glean-gene)°. All Glean-predicted 
genes were retained, as were all genes supported by RNA-seq data and those 
predicted using two or more de novo methods (that is, Augustus, 
GlimmerHMM and/or SNAP). The open reading frame of each gene was predicted 
using BestORF (www.softberry.com). To assess the quality and accuracy of the 
predicted gene set, we examined the length-distribution of all genes, coding 
sequences, exons and introns, and the distribution of exon numbers for individual 
genes, and then compared these parameters with those calculated for the published 
gene sets of B. malayi, C. elegans, P. pristionchus and M. incognita (Supplementary 
Fig. 4). 

Following prediction of the finalized gene set, we conducted pairwise analysis 
of the overall synteny existing between/among the large (>1 Mb) assembly scaf- 
folds for B. malayi and A. suum relative to the complete C. elegans chromosomes. 
This analysis was undertaken by conducting pairwise alignments among all 
A. suum or B. malayi (WS220 assembly: ftp://ftp.sanger.ac.uk/pub2/wormbase/ 
releases/WS220/genomes/b_malayi/) scaffolds larger than 1 Mb in size and the 
C. elegans chromosomes using LASTz (http://www.bx.psu.edu/miller_lab/dist/ 
README lastz-1.02.00/README.lastz-1.02.00a.html), which were then joined 
using CHAINNET™ and output as a .axt alignment from which large-syntenic 
regions were defined. The resulting alignment files were used to construct synteny 
images on scaleable vector graphics format using customized perl scripts (ZX). In 
addition, gene-level synteny analyses were conducted for one-to-one orthologous 
genes colocalizing to large A.suum or B. malayi assembly scaffolds (>1 Mb) 
according to ref. 9. Orthology was determined by pairwise reciprocal BLASTx 
comparisons between A. suum, B.malayi and C. elegans according to ref. 11. 
One-to-one orthologous genes shared between either A. suum or B. malayi and 
C. elegans but not shared among A.suum and B.malayi based on reciprocal 
BLASTp analysis were further confirmed by Hidden Markov Modelling using 
the jackhmmr command in the program HMMER3.0 (ref. 50) and a highly 
permissive threshold (HMM cutoff: 10-7). 

We assessed the genome-wide variation in the exonic regions by mapping all 
raw reads from our transcriptomic data to the genomic coding domains using 
Maq”’, and calling SNPs with a minimum coverage threshold of ten reads. All 
mapped reads were assessed as synonymous (non-coding change), non-synonymous 
(coding change) or ambiguous (a SNP that was represented in our data set as an 
ambiguous IUPAC code wherein one nucleotide change would cause a synonym- 
ous mutation and the other a non-synonymous mutation) using a custom Perl 
script (snp_analysis.pl). All genes were then ranked based on their accumulation of 
SNPs to assess and identify their levels of conservation/variation relative to their 


©2011 Macmillan Publishers Limited. All rights reserved 


function. We reasoned that, in addition to the real effects of the variability of each 
gene on their accumulation of SNPs, these data would be influenced also by the 
coverage achieved for each gene, which is affected by the number of reads available 
for each gene (that is, their relative levels of transcription) and the length of each 
gene. Thus, before ranking, the SNP data for each gene was normalized for its 
calculated reads per kilobase per million reads (RPKM) and total gene length using 
the simple equation: SNPs per read per kilobase = total SNPs divided by RPKM 
divided by gene length (in bp) multiplied by 1,000 bp. Following ranking, we 
explored function among the 2.5% most variable (with the highest rankings based 
on normalized SNP data) and most conserved genes (with the lowest rankings 
based on normalized SNP data). Noting the potential inaccuracy associated with 
estimating the normalized SNP rankings of lowly transcribed genes (owing to a 
lack of data/coverage), only genes for which at least 100 reads were available were 
considered in these functional comparisons. 

Functional annotation of coding genes. Following the prediction of the protein- 
coding gene set, each inferred amino acid sequence was assessed for conserved 
protein domains in the SProt, Pfam, PRINTS, PROSITE, ProDom and SMART 
databases using InterProScan™, employing default settings. Gene ontology cat- 
egories” were assigned to each contig inferred to contain at least one conserved 
protein domain. Gene ontology categories were summarized and standardized to 
level 2 and level 3 terms, defined using the GOslim hierarchy” using WEGO”. To 
characterize further the contigs/transcripts from A. suum, we conducted a series of 
high-stringency BLASTp homology searches (e-value cut-off: 10°) against a 
variety of databases. Each contig was assessed for a known functional orthologue, 
defined using the Kyoto Encyclopaedia of Genes and Genomes (KEGG) 
(www.kegg.com). Where appropriate, orthologous matches were mapped visually 
to a defined pathway using the KEGG pathway tool (available via www.kegg.com) 
or clustered to a known protein family using the KEGG-BRITE hierarchy tool 
(available via www.kegg.com). In addition, the amino acid sequence inferred from 
each A.suum coding gene was compared by BLASTp with protein sequences 
available for key nematode species (B.malayi, C. elegans, P.pacificus and 
M. incognita) as well as for Drosophila melanogaster’ and Mus musculus’ and 
those contained within the UniProt*’, SwissProt and TREMBL databases”*. Key 
protein groups (for example, peptidases, kinases, phosphatases, GTPases, GPCRs, 
and transport and channel proteins) were characterized by high-stringency 
BLASTp homology searching (e-value cut-off <10~°) of manually curated 
information sequence data available in the MEROPS”, WormBase, KS-Sarfari 
(https://www.ebi.ac.uk/chembl/sarfari/kinasesarfari) and GPCR-Sarfari (https:// 
www.ebi.ac.uk/chembl/sarfari/gpcrsarfari) and the Transporter Classification 
database’. E/S proteins were predicted using Phobius®', employing both the 
neural network and hidden Markov models, and by BLASTp homology-searching 
of the validated signal peptide database and an E/S database containing pub- 
lished proteomic data for B. malayi****, Schistosoma mansoni® and M. incognita®®. 
In the final annotation, proteins inferred from genes were classified based on a 
homology match (e-value cut-off: <10 °) to: (1) a curated, specialist protein 
database, followed by (2) the KEGG database, followed by (3) the UniProt/ 
SwissProt/TREMBL databases, followed by (4) the annotated gene set for a model 
organism, including C. elegans, D. melanogaster, M. musculus or S. cerevisiae, fol- 
lowed by (5) the gene ontology classification, and, finally, (6) a recognized, con- 
served protein domain based on InterProScan analysis. Any inferred proteins 
lacking a match (BLASTp cut-off =10 °) in at least one of these analyses were 
designated hypothetical proteins. The final annotated protein-coding gene set for 
A. suum is available for download at WormBase (in nucleotide and amino acid 
formats). 

Differential transcription analysis. Following RNA-seq, all paired-end reads for 
each library constructed were aligned to the predicted A.suum gene set using 
TopHat, and quantitative levels of transcription (RPKM)* were calculated using 
Cufflinks. Differential transcription was assessed” using a P-value cut-off of $0.01 
anda minimum, two-fold difference in absolute RPKM values. False discovery rates 
for differential transcription were determined®. To allow the rapid visual assess- 
ment of the statistically significant changes in transcription of each gene between 
and among individual libraries, we constructed heat-maps representing absolute 
differences in the RPKM values, calculated for each transcript using a customized 
Perl script (express_heatmap_RPKM.pl). Genetic interaction networks were pre- 
dicted® based on data available for homologous genes in C. elegans (inferred from 
BLASTp comparisons) and viewed using the program BioLayout 3D”. 
Essentiality and druggability predictions. A. suum genes with homology to 
those in the C. elegans and/or D. melanogaster genomes were inferred based on 
BLASTp comparisons using the predicted protein sequences for individual species 
(e-value cut-off 10° °). Phenotypic data for each C. elegans and D. melanogaster 
homologue were sourced from WormBase and FlyBase (www.flybase.org), 
respectively. A. suum genes determined”! to have homologues with lethal pheno- 
types in both C. elegans and D. melanogaster were inferred to represent essential 
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genes. Metabolic chokepoints were defined’” and assessed based on A. suum 
gene sequences determined, by BLASTp comparison (10° °), to have an orthologue 
in the KEGG database. All ‘essential’ homologues and/or molecules in ‘choke- 
points’ were then queried against the BRENDA” and CHEMBL databases 
(accessible via https://www.ebi.ac.uk/chembldb/), to identify known chemical 
inhibitors. 

Additional bioinformatic analyses, and use of software. Data analysis was 
conducted in a Unix environment or Microsoft Excel 2007 using standard com- 
mands. Bioinformatic scripts required to facilitate data analysis were designed 
using Perl, BioPerl, Java and Python and are available via http://research.vet. 
unimelb.edu.au/gasserlab/. 
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Commensal microbiota and myelin autoantigen 
cooperate to trigger autoimmune demyelination 
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Active multiple sclerosis lesions show inflammatory changes sug- 
gestive of a combined attack by autoreactive T and B lymphocytes 
against brain white matter’. These pathogenic immune cells derive 
from progenitors that are normal, innocuous components of the 
healthy immune repertoire but become autoaggressive upon patho- 
logical activation. The stimuli triggering this autoimmune conver- 
sion have been commonly attributed to environmental factors, in 
particular microbial infection’. However, using the relapsing- 
remitting mouse model of spontaneously developing experimental 
autoimmune encephalomyelitis’, here we show that the commensal 
gut flora—in the absence of pathogenic agents—is essential in trig- 
gering immune processes, leading to a relapsing-remitting auto- 
immune disease driven by myelin-specific CD4* T cells. We show 
further that recruitment and activation of autoantibody-producing 
B cells from the endogenous immune repertoire depends on avail- 
ability of the target autoantigen, myelin oligodendrocyte glyco- 
protein (MOG), and commensal microbiota. Our observations 
identify a sequence of events triggering organ-specific autoimmune 
disease and these processes may offer novel therapeutic targets. 

The relapsing-remitting (RR) mouse model uses transgenic SJL/J 
mice expressing, in a large proportion of their CD4~ T cells, a trans- 
genic T-cell antigen receptor (TCR) recognizing MOG peptide 92-106 
in the context of MHC class II, I-A*. These mice spontaneously develop 
experimental autoimmune encephalomyelitis (EAE) with successive 
disease bouts that often affect different central nervous system (CNS) 
tissues. The disease is initiated by the transgenic CD4* T cells, which 
first infiltrate the CNS, and by MOG-autoantibody-producing B cells 
recruited from the natural immune repertoire’. 

Whereas in our facility close to 80% of RR mice developed spon- 
taneous EAE within 3-8 months of age, the rate was variable in other 
institutions, with spontaneous EAE incidences ranging from 35-90% 
(unpublished data). This recalled previous investigations that also 
observed that the frequency of spontaneous EAE in myelin-specific 
TCR transgenic mice varied in different breeding centres*. Because our 
mice were reared under specific pathogen-free (SPF) conditions, we 
tested the possible contributions of the non-pathogenic commensal 
flora to the triggering of a spontaneous CNS-specific autoimmune 
disease. 

We first compared the incidence of spontaneous EAE between RR 
mice housed under SPF and completely germ-free conditions. The 
differences were marked. Whereas, as reported before, most SPF-bred 
RR mice came down with EAE within 3-8 months’, germ-free RR mice 
remained fully protected throughout their life (Fig. 1a). As the com- 
mensal microbiota have a central function in driving the correct 
development of the immune system’, the absence of spontaneous 
EAE in germ-free RR mice may have reflected a general immune defi- 
ciency due to missing microbial stimuli. However, two observations 
argue against a profound and irreversible non-reactivity. First, RR mice, 
which had been germ free (and disease free) for 6-12 weeks, promptly 
developed EAE when re-colonized with conventional commensal 
microbiota (Fig. 1b). Mono-colonization with segmented filamentous 


bacteria (SFB), which restored autoimmunity in another mouse model, 
was of low efficiency (unpublished data). This suggests that the 
immune system of germ-free mice had grown efficient enough to 
mounta full autoimmune attack within a relatively brief period of time. 
Second, the basic immune competence of germ-free animals was con- 
firmed by active immunization of germ-free wild-type SJL/J mice with 
recombinant MOG (rMOG) in complete Freund’s adjuvant (CFA). In 
accord with one previous report®, although not with another more 
recent one’, all immunized germ-free mice developed EAE like their 
SPF counterparts, although with some delay (Fig. 1c), and transfer of 
pre-activated T cells induced comparable EAE in both germ-free and 
SPF mice (Supplementary Table 1). Moreover, germ-free and SPF SJL/J 
mice immunized with rMOG produced comparable levels of anti- 
MOG antibodies in their serum (Fig. 1d). 

Recent studies established that components of the commensal micro- 
biota profoundly shape the gut-associated lymphatic tissue (GALT), 
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Figure 1 | Commensal microbiota are required for the development of 
spontaneous EAE. a, Incidence of spontaneous EAE in a cohort of RR mice 
housed in germ-free (GF; n = 35) or SPF (n = 41) conditions. b, Incidence of 
spontaneous EAE in germ-free RR mice (n = 10) re-colonized with 
conventional flora from SPF mice. c, Delayed EAE onset in germ-free wild-type 
(GE WT) SJL/J mice immunized with rMOG/CFA. Mean EAE scores (+ s.e.m.) 
of germ-free (n = 7) and SPF (n = 8) SJL/J mice are shown. *P < 0.05; 
**P < 0.01 (two-way ANOVA). d, Germ-free and SPF wild-type SJL/J mice 
produce similar levels of anti- MOG antibodies after immunization. Each circle 
represents an individual mouse and bars depict mean + s.e.m. Panels ¢ and 
d represent two individual experiments. 
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some supporting differentiation of interleukin (IL)-17-producing 
Tyl7 cells*"° and others the generation of regulatory T lymphocytes 
(Tyeg)'”"”. We also found a marked deficit of Tj17-like cells in germ- 
free mice (Fig. 2a), which was most pronounced in T cells intimately 
connected to the intestinal wall, lamina propria T cells and in Peyer’s 
patch but not in mesenteric lymph node populations. There were no 
notable changes in remote organs, such as spleen or pooled inguinal 
and axillary lymph nodes (Fig. 2a). Frequencies of IFN-y-, TNF-o- and 
IL-10-producing CD4* T cells were comparable between germ-free 
and SPF RR mice (Fig. 2a and Supplementary Fig. 1). Apart from a 
minor increase in the frequency of CD4* T cells in the spleen of germ- 
free RR mice and a reduction of the T cells expressing lower levels of 
T-cell receptor (TCRaf'™) (activated T cells) in the lamina propria 
(Supplementary Figs 2 and 3), the ® proportions of most lymphoid cell 
types examined, including Foxp3~ Treg cells, CD8" T cells, TCRy5~ 
cells, B cells, CD11b* macrophages, CD11c* dendritic cells, natural 
killer (NK) cells and Gr1* granulocytes were unchanged (Supplemen- 
tary Fig. 2). Of note, although in the spleen the microbial colonization 
status did not affect cellular composition, it definitely impinged on 
cytokine production of splenic immune cells. As in MOG-immunized 
germ-free C57BL/6 mice’, germ-free RR mouse spleen cells released 
lower levels of IL-17 than their SPF counterparts upon MOG antigen 
or anti-CD3 monoclonal antibody stimulation, and in addition they 
showed reduced secretion of IFN-y. Re-colonization of germ-free mice 
not only restored T-cell cytokine production capacity but even led to 
overshooting reactions (Supplementary Fig. 4). 

The commensal microbiota could act on MOG-specific T cells 
either via microbial structures mimicking MOG epitopes” or through 
innate immune signals creating a particular inflammatory milieu''"’. 
In an attempt to probe a potential MOG-specific mimicry response, we 
transferred carboxyfluorescein succinimidyl ester (CFSE)-labelled 
TCR transgenic or wild-type T cells into SPF wild-type mice and tested 
their proliferative responses in the gut. Proliferation rates of transgenic 
and polyspecific wild-type T cells in the GALT were equally high, 
whereas in the remote spleen of the same recipients the responses 
remained hardly detectable (Fig. 2b). Further, the microbial signals 


seem to act persistently on local T cells. Transient depletion of gut 
flora by short-term antibiotic treatment significantly reduced the pro- 
liferation of T cells in the lamina propria, but not in spleen, pooled 
lymph nodes, mesenteric lymph nodes and Peyer’s patches (Sup- 
plementary Fig. 5). 

Activation of MOG-specific T cells in the GALT is necessary for the 
development of EAE in RR mice, but not sufficient. Full clinical EAE 
requires the participation of MOG-reactive B lymphocytes. We pro- 
posed that in RR mice, transgenic pathogenic T cells select the auto- 
immune B cells from the native B-cell repertoire and drive them to 
proliferate and release autoantibodies of IgG classes’. Indeed, we now 
found that germ-free RR mice, which, owing to missing microbial 
stimuli, lack activated autoimmune T cells, produced only low doses 
of anti-MOG autoantibodies. The autoantibody production was 
promptly increased in germ-free mice upon re-colonization (Fig. 3a). 
This response could involve antigenic mimicry at the B-cell level 
between MOG and epitopes on commensal microbes, reminiscent of 
Sydenham’s chorea—the CNS manifestation of rheumatic fever—in 
which streptococcal antigens mimic neuronal B-cell epitopes’. 
However, this was not the case in spontaneous RR mouse EAE. We 
discovered that production of demyelinating autoantibodies critically 
depended on the expression of the target myelin autoantigen, MOG. 
RR mice deficient in MOG (RR X MOG~‘~), due to a transgenic 
mutation of the Mog gene", failed to develop anti-MOG autoantibody 
titres despite their normal microbial status (Fig. 3a). Importantly, our 
data show that exogenous MOG injected into SPF RR X MOG /~ 
mice via MOG in CFA readily induced anti-MOG antibodies 
(Supplementary Fig. 6). 

Recruitment and activation of antigen-specific B cells involves signals 
by local helper T cells and surrounding stroma cells, which together 
drive the resting B cell into a germinal centre, where it undergoes 
proliferation, immunoglobulin class switching and somatic hyper- 
mutation’’. Binding of the cognate antigen to the B-cell receptor has 
a central role in these processes. MOG-specific B cells could be 
recruited either in the CNS tissue via locally produced MOG material’®, 
or in CNS draining peripheral lymph nodes (deep cervical lymph 


Oh cae b Spleen pLN mLN PP LP 
— see 
< ° 2 9 | 9 | 9 ] i) 
& |Cspr 3 | 2.5% | 4.8% | 3.6% 11.4% “Ll, 
3 15 aa | | 
ta 8 | | | 
bk i o | | | 
st 104 : 
it ] } 
9 8 of 23.7%, | 18% 3.2% 4%| | 15.4% 
54 =e | | | 
& ° of a E | | | 
= | | 
“= ae = tha : = 
Ss SY & 
a) 
es wt < 
gods 305 4 mm RR T cells 
= 4 9 & WTT cells 
2°] 2 
8 2{89 8 20-4 cl 
+ i fe) 8 
a 64 a Q.Q 
Oo ] fe} re) = ots 
© 4 Hi o 88 zg 107 9 o 8 Wk 
— a 8 8 Ww fe) 
= Sac E of © , 88 fe HE 
ee ee 
eS Se & 
A) 
ae vs 


Figure 2 | Effect of microbiota on T-cell activation and their cytokine 


experiments. ***P = 0.0002; **P = 0.0025 (Mann-Whitney U test). 


profiles in the GALT. a, Impaired T}17 differentiation in germ-free (GF) RR __b, Activation of T cells by commensal flora. Shown are the frequencies of 


mice. Frequencies of IL-17- or IFN-y-producing T cells from the indicated 
organs of GF and SPF RR mice are shown. LP, lamina propria; mLN, mesenteric 
lymph nodes; pLN, pooled inguinal and axillary lymph nodes; PP, Peyer’s 
patches. n = 8-13 mice per group. Data were pooled from four independent 
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CESE™ CD4* cells in the indicated organs of mice that received CFSE-labelled 
CD4* T cells. Each circle represents an individual mouse and bars depict 
mean + s.e.m. n = 4-7 mice per group. Data represent two individual 
experiments. *P < 0.05 (Mann-Whitney U test). 
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nodes) with MOG imported from the CNS via lymphatic vessels’’. Our 
observations favour the latter alternative. Prior to the onset of clinical 
symptoms in SPF RR mice we found some scattered B cells in CNS 
infiltrates (Supplementary Fig. 7) but no follicle-like aggregates or 
follicular markers (data not shown). However, there were conspicuous 
changes in the cervical lymph nodes. Starting from the age of 3 weeks, 
cervical lymph nodes contained clearly delineated germinal centres 
(Fig. 3b). Furthermore, germinal centres and increased frequencies 
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Figure 4 | MOG-specific B cells home to the germinal centre of brain 
draining cervical lymph nodes. a, Schematic representation of experimental 
set up. IHC, immunohistochemistry. b, MOG-specific B cells home to the 
germinal centre of brain draining cervical lymph nodes. PNA (blue; germinal 
centre) and B220 (red; B cells), Arrowheads indicate transferred GFP’ B cells. 
Dotted lines define boundaries of germinal centre. Representative staining of 
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Figure 3 | B-cell recruitment is impaired in germ-free RR mice. a, Reduced 
production of MOG-specific IgG2a antibodies in sera of germ-free (GF) RR 
mice. SPF (n = 15); germ-free (n = 24); germ-free mice re-colonized with 
conventional commensal microbiota (Ex-GF; n = 7); or SPE-MOG /~ 

(n = 13) RR mice. Error bars indicate s.e.m. ***P < 0.001 (Kruskal-Wallis 
test). b, c, Spontaneously formed germinal centre B cells are enriched in the 
cervical lymph nodes (cLN) of RR mice. b, Immunofluorescence staining of 
cervical lymph nodes or inguinal lymph node (iLN) sections of RR and NTL 
mice. Peanut agglutinin (PNA) (green; germinal centre), B220 (red; B cells) and 
4',6-diamidino-2-phenylindole (DAPI) (blue; cell nuclei). Representative data 
of 4-5 individual mice are shown. Scale bars, 100 jim. ¢, Flow cytometric 
analysis of GL7* Fas” germinal centre B cells from spleen, cervical lymph 
nodes, inguinal lymph nodes and bone marrow (BM). Each circle represents an 
individual mouse and bars represent mean + s.e.m. n = 5-10 per group. Data 
were pooled from three independent experiments. *P < 0.05 (Mann-Whitney 
U test). 


of GL7* and Fas" germinal centre B cells were restricted to cervical 
lymph nodes of RR mice, but were significantly reduced in age- 
matched germ-free and non-transgenic littermates (NTL) (Fig. 3b, c). 

Germinal centres are attractive milieus for B cells, provided they 
contain appropriate antigenic material and competent T-helper cells”°. 
To investigate whether RR mouse cervical lymph nodes offer both 
prerequisites to MOG-reactive B cells, we transferred GFP-labelled, 
MOG-reactive B cells expressing the H chain of a MOG monoclonal 
antibody (IgH™°°)! into hosts with a distinct allotype and traced their 
homing behaviour (Fig. 4a). When examined 14 days after transfer, 
MOG-specific B cells were found densely packed within germinal 
centres of cervical lymph nodes (Fig. 4b). Further, donor IgH™°° B 
cells switched anti-MOG antibodies to IgG2a isotypes (Fig. 4c). 
However, GFP-labelled polyclonal wild-type B cells failed to accu- 
mulate in any of the lymph nodes of RR mice (Fig. 4b), and IgHM°° 
B cells neither homed to the cervical lymph nodes of wild-type or RR X 
MOG ‘~ mice (Fig. 4b and Supplementary Fig. 8) nor produced class- 
switched anti-MOG antibodies (Fig. 4c and Supplementary Fig. 8c). 
Collectively, these data indicate an ongoing MOG-specific germinal 
centre reaction, which is critically dependent on the expression of 
MOG, in the cervical lymph nodes of RR mice. 
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two independent experiments are shown. n = 4-5 mice per group. 
Magnification: 20. c, Transgenic MOG-specific B cells spontaneously switch 
isotype in RR but not in RR X MOG ‘~ mice. Titres of donor (a allotype) and 
recipient (b allotype) anti- MOG antibodies were measured. Error bars 
represent s.e.m. n = 4-5 mice per group. Data were pooled from three 
independent experiments. *P < 0.05 (Mann-Whitney U test). 
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This is the first report, to our knowledge, describing the sequential 
roles of the intact commensal gut flora and of myelin autoantigen in 
the initiation of a complex spontaneous demyelinating autoimmune 
disease. We propose a two-phase scenario that starts out in the GALT 
with expanding and activating CNS autoreactive T cells, which then 
recruit autoantibody-producing B cells. Together the autoimmune T 
and B cells trigger a demyelinating encephalomyelitis, which in the RR 
SJL/J mouse takes a relapsing-remitting course, very similar to early 
human multiple sclerosis. 

Our findings are of direct relevance to multiple sclerosis, the patho- 
genesis of which is presently hotly debated. Some propose primary 
changes in the CNS target as the initiating process”, whereas others 
suggest that pathogenesis originates in the immune system”. Our 
present data support the latter concept. It is tempting to extend our 
finding of the gut origin of experimental CNS autoimmunity to human 
multiple sclerosis. There is now emerging evidence implicating gut 
microbiota in the starting phase of human autoimmune diseases. 
Besides inflammatory bowel diseases, in which bacteria may act on 
local tissue directly as well as indirectly”, inflammatory diseases with 
remote tissues affected seem to be modulated by the gut environment; 
for example, in rheumatoid arthritis” and type 1 diabetes mellitus*®. In 
multiple sclerosis, evidence for commensal microbial contributions 
has remained less clear, so far. Dietary risk factors have been suggested 
to have a role’, and may contribute to a conspicuous increase of 
multiple sclerosis prevalence in Asian countries, like Japan, which 
has been ascribed to the spreading of a ‘westernized’ lifestyle**. It will 
be of interest now to search for the composition of intestinal micro- 
biota associated with an increased susceptibility to multiple sclerosis, 
and this may provide a conceptual basis for exploring new, non-invasive 
treatment strategies. 


METHODS SUMMARY 

Mice. Germ-free animals were re-derived from SJL/J anti- MOG TCR transgenic 
RR mice and kept germ free at the animal facility of the Max Planck Institute of 
Immunobiology and Epigenetics. Mice were re-colonized by housing in bedding 
material pre-conditioned by SPF mice. 

Cell purification, flow cytometry and adoptive transfers. Single-cell suspensions 
were prepared from spleen, lymph nodes, Peyer’s patches and lamina propria by 
enzymatic digestion or mechanical disruption. Untouched T cells and B cells were 
purified using negative isolation kits (R&D Systems). Cells were stained with 
fluorochrome-labelled antibodies and acquired on FACSCalibur (BD 
Biosciences). Data were analysed using FlowJo (TreeStar) software. CFSE-labelled 
T cells or GEP* B cells were injected intravenously into SPF mice. CFSE®” T cells 
were quantified by FACS 3 days after transfer. Localization of GEP* B cells was 
documented by immunofluorescence after 2 weeks. 

Immunofluorescence. Sections of immune organs were stained with PNA (Vector 
Laboratories), anti-mouse B220 (BD Biosciences) and DAPI (Invitrogen). Images 
were obtained with a fluorescence microscope (Axiovert 200M; Carl Zeiss) and 
processed with MetaMorph 7.7 Software and Adobe Photoshop CS4. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice and colonization. Wild-type SJL/J, RR and RR X MOG~‘~ SJL/J and 
IgH“°¢ X actin-GFP and actin-GFP SJL/J mice were bred at the animal facility 
of the Max Planck Institute of Neurobiology. Germ-free RR mice were obtained by 
transferring embryos, isolated by sterile hysterectomy on embryonic day (E) 18.5, 
to sterile breeding conditions and by fostering on germ-free foster mothers. They 
were bred and maintained in positive-pressure plastic isolators and provided with 
y-irradiated commercial rodent diet and autoclaved water at the animal facility of 
the Max Planck Institute of Immunobiology and Epigenetics. Fecal samples were 
routinely cultured in standard I-Bouillon and examined for contamination. In 
addition, mice from the colony were screened bi-annually according to FELASA 
health monitoring recommendations. For re-colonization experiments, germ-free 
mice were placed in cages with bedding material pre-conditioned by conventional 
(SPF) mice. All animal procedures were in accordance with the guidelines of the 
Committee on Animals of the Max Planck Institute of Neurobiology and with a 
license from the Regierung von Oberbayern. 

Active induction of EAE. Mice were immunized subcutaneously with 200 pg 
rMOG emulsified in Freund’s adjuvant supplemented with 5mgml' 
Mycobacterium tuberculosis (strain H37Ra; Difco). On days 0 and 2 after immun- 
ization, 200 ng of pertussis toxin (List Biological Laboratories) were injected intra- 
peritoneally. Clinical scoring of EAE was done as published’. 

Antibiotic treatment. For short-term antibiotic treatment, 8-week-old wild-type 
SJL/J mice were treated for 7 days with 1 gl! of metronidazole (Sigma), 1 gl! of 
neomycin (Sigma) and 0.5 g]' of vancomycin (AppliChem) in their drinking water. 
CFSE labelling and adoptive transfer. Splenocytes from RR or wild-type mice 
were labelled at 37 °C for 10 min with 5 1M CFSE (Invitrogen) in PBS containing 
1% fetal bovine serum (FBS). Cells were washed twice in ice-cold PBS and sub- 
sequently CD4* T cells were isolated using a mouse CD4* T-cell isolation kit 
(R&D Systems). 5 X 10° CFSE-labelled CD4* T cells were injected intravenously 
into wild-type SJL/J mice. 

B-cell isolation and adoptive transfer. B cells were isolated from spleens using a 
mouse B-cell isolation kit (R&D Systems). B cells were enriched to >90% purity, as 
confirmed by flow cytometry. 10 X 10° purified IgHM°°-GFP or wild-type-GFP 
B cells were intravenously injected into RR, wild-type or RR X MOG ‘~ mice. 
Proliferation assay. For the proliferation assay, 2 X 10° splenocytes were cultured 
in the presence of various concentrations of rMOG or anti-CD3 antibody (BD 
Pharmingen), as indicated. Proliferative response was measured by the incorpora- 
tion of [°H]-thymidine (1 ,tCi well ~') during the last 16 h of a 72 h culture period. 
Proliferation assays were performed in triplicates. 

Cell isolation and flow cytometry. Single-cell suspensions were prepared from 
spleen, pooled peripheral lymph nodes (axillary plus inguinal), or individual 
lymph nodes (cervical and inguinal), mesenteric lymph nodes or Peyer’s patches 
by mechanical disruption via forcing through 40-um cell strainers (BD 
Biosciences). For the isolation of lamina propria lymphocytes, small intestine 
was collected in ice-cold HBSS buffered with 15 mM HEPES. After careful removal 
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of Peyer’s patches, fatty tissue and fecal contents, the intestine was opened 
longitudinally and cut into small pieces. The intestinal fragments were washed 
three times for 15 min with stirring (300 r.p.m.) in HBSS containing 5mM EDTA, 
15mM HEPES and 10% FBS. Next, intestinal pieces were washed once for 5 min 
with stirring in RPMI containing 15 mM HEPES and 10% FBS, followed by an 
incubation step at 37 °C with stirring (500 r.p.m.) in RPMI with 15 mM HEPES, 
10% FBS and 100 U ml‘ Collagenase VII (Sigma). The digested tissue was washed 
once in RPMI with 15mM HEPES and 10% FBS, before the lamina propria 
lymphocytes were subjected to FACS analysis. CNS infiltrating cells were purified 
by Percoll gradient centrifugation as described’. For detection of cell surface 
markers, cells were stained in FACS buffer (PBS containing 1% BSA and 0.1% 
NaN;) with fluorochrome-labelled monoclonal antibodies: PerCP-conjugated 
anti-CD4 (RM4-5), PerCP-Cy5.5-conjugated anti-B220 (RA3-6B2), PE- and 
APC-conjugated anti-CD19 (1D3), APC-conjugated anti-CD8« (53-6.7), PE- 
conjugated anti-TCRyd (eBioGL3), PE-conjugated anti-CD11b (M1/70), 
APC-conjugated anti-NKp46 (29A1.4), FITC-conjugated anti-CD11lc (HL3), 
biotin-conjugated anti-Grl (RB6-8C5), FITC-conjugated anti-CD45.1 (A20), 
FITC-conjugated anti-VB4 (KT4), PE-conjugated anti-Vo8.3 (B21.14), 
FITC-conjugated anti-GL7, PE-conjugated anti-Fas (Jo2) and PE-conjugated 
streptavidin. For intracellular cytokine staining, cells were activated with 50 ng 
ml | PMA (Sigma) and 500 ng ml‘ ionomycin (Sigma) in the presence of 5 pg 
ml * brefeldin A (Sigma) for 4h at 37°C. After surface staining, cells were fixed 
and permeabilized in 4% paraformaldehyde/0.1% saponin in HEPES-buffered 
HBSS and stained intracellularly using the following antibodies: PE-conjugated 
anti-IL17 (TC11-18H10), APC-conjugated anti-IFN-y (XMG1.2), APC- 
conjugated anti-TNF-« (MP6-XT22), PE-conjugated anti-IL-10 (JES5-16E3) 
and APC-conjugated anti-FoxP3 (FJK-16 s). All antibodies were purchased from 
BD Pharmingen or eBioscience. Cells were acquired on a FACSCalibur (BD 
Biosciences) and analysis was performed using FlowJo (TreeStar) software. 
ELISA. Serum titres of anti-MOG antibodies were quantified as previously 
described*. Cytokine levels in cell culture supernatants were determined with 
antibody pairs for IFN-y (BD Biosciences) or IL-17 (eBioscience). 
Immunofluorescence. Organs were fixed in PBS with 4% paraformaldehyde and 
cryoprotected in PBS plus 30% sucrose before embedding in OCT medium (A. 
Hartenstein). Cryostat sections (10 tm in thickness) of spleen, lymph nodes and 
brains were fixed in acetone. Sections were blocked with PBS and 5% BSA before 
being stained in a humidified chamber. The following antibodies were used for 
staining: biotin-conjugated anti-CD4 (BD Pharmingen), purified anti-B220 (BD 
Pharmingen), biotin-conjugated PNA (Vector Laboratories), Alexa Fluor 568- 
conjugated anti-rat IgG (Invitrogen), Alexa Fluor 488-conjugated Streptavidin 
(Invitrogen), APC-conjugated streptavidin (eBioscience) and DAPI (Invitrogen). 
Images were obtained with a fluorescence microscope (Axiovert 200M; Carl Zeiss) 
and processed with MetaMorph 7.7 Software and Adobe Photoshop CS4. 
Statistical analysis. GraphPad Prism 5 (GraphPad Software) was used for all 
statistical analysis. P values <0.05 were considered to be significant. 
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Aspartate 112 is the selectivity filter of the human 
voltage-gated proton channel 


Boris Musset'*, Susan M. E. Smith?*, Sindhu Rajan*, Deri Morgan!, Vladimir V. Cherny’ & Thomas E. DeCoursey! 


The ion selectivity of pumps and channels is central to their ability 
to perform a multitude of functions. Here we investigate the mech- 
anism of the extraordinary selectivity of the human voltage-gated 
proton channel’, Hy1 (also known as HVCN1). This selectivity is 
essential to its ability to regulate reactive oxygen species production 
by leukocytes”*, histamine secretion by basophils’, sperm capacita- 
tion®, and airway pH’. The most selective ion channel known, Hy1 
shows no detectable permeability to other ions’. Opposing classes of 
selectivity mechanisms postulate that (1) a titratable amino acid 
residue in the permeation pathway imparts proton selectivity’*", 
or (2) water molecules ‘frozen’ in a narrow pore conduct protons 
while excluding other ions’*. Here we identify aspartate 112 as a 
crucial component of the selectivity filter of Hy1. When a neutral 
amino acid replaced Asp 112, the mutant channel lost proton spe- 
cificity and became anion-selective or did not conduct. Only the 
glutamate mutant remained proton-specific. Mutation of the 
nearby Asp 185 did not impair proton selectivity, indicating that 
Asp 112 has a unique role. Although histidine shuttles protons in 
other proteins, when histidine or lysine replaced Asp 112, the 
mutant channel was still anion-permeable. Evidently, the proton 
specificity of Hy1 requires an acidic group at the selectivity filter. 

Voltage-gated proton channels are considered specific (perfectly 
selective) for protons, because no evidence exists for permeation of 
anything but H™. Specificity, combined with a large deuterium isotope 
effect’ and extraordinarily strong temperature dependence of conduc- 
tion’ suggests a permeation pathway more complex than a simple 
water wire, as exists in gramicidin’ . All proton conduction seems con- 
sistent with a hydrogen-bonded chain (HBC) mechanism"; a HBC 
including a titratable group could explain several unique properties of 
Hy] (ref. 1), especially proton selectivity’*. Yet in a recent study, muta- 
tion of each titratable amino acid in all four transmembrane helices of 
Hy]1 failed to abolish conduction’*. Thus, the mechanism producing 
proton selectivity remained unknown. 

We noticed that a human gene, C150rf27 (of unknown function), 
contains a predicted voltage sensor domain (VSD) that shares 25% 
sequence identity and 52% similarity (http://www.ebi.ac.uk/Tools/ 
emboss/align/needle) with the VSD of Hyl, and includes three Arg 
residues in the $4 transmembrane helix that are conserved among all 
known Hy1 homologues. Phylogenetic analysis of VSD sequences (Sup- 
plementary Fig. 1) reveals that a group comprising Hy1, Cl5orf27 and 
voltage-sensitive phosphatase (VSP) sequences separated early from 
the two phylogenetically distinct groups of depolarization activated 
VSDs described previously (Ky channels and Nay/Cay channels), 
supporting the modular evolution of VSD-containing proteins’’. 
Furthermore, Hy1 VSDs occupy a discrete lineage, distinct from those 
of VSP and Cl5orf27 orthologues. 

When we cloned the C15orf27 gene and expressed the product in 
HEK-293 or COS-7 cells, the green fluorescent protein (GFP)-tagged 
protein localized at the plasma membrane (Supplementary Fig. 2), but 
we detected no currents beyond those in non-transfected cells. We 


reasoned that substitutions based on sequence elements that differ 
between Hyl and Cl5orf27 should be structurally tolerated while 
revealing residues responsible for proton conduction. We therefore 
mutated residues that are perfectly conserved in 21 Hy1 family members 
and differ between Cl5orf27 and Hyl. We replaced five candidate 
residues in Hy1 (D112, D185, N214, G215 and $219) (Fig. la, b) with 
the corresponding residue in C15orf27. Four mutants exhibited large 
currents under whole-cell voltage clamp (Fig. 1d). The reversal (zero 
current) potential, V,.,, measured at several pH, and pH; (external and 
internal pH, respectively), was close to the Nernst potential for protons, 
Ey (Fig. 1c), demonstrating proton selectivity. D112V mutants localized 
to the plasma membrane (Supplementary Fig. 3), but showed no con- 
vincing current (Fig. 1d). Some D112V-transfected HEK-293 or COS-7 
cells (and non-transfected cells) had small native proton currents. 
H140A/H193A double mutants'™!”, in which the two Zn’* -binding 
His residues are neutralized, resemble wild type, with similar ApH- 
dependent gating’, and V,., near Ey (Supplementary Fig. 4). We 
expressed mutants in this Zn**-insensitive background (D112X/A/ 
A) to distinguish their currents from native currents that are abolished 
by 100 1M Zn** at pH, 7.0. We tentatively concluded that Asp 112 is 
crucial to proton conduction. 

The absence of detectable currents in D112V led us to make other 
D112X substitutions. These mutants (Fig. 2a) showed slowly activating 
outward currents upon depolarization that resembled Hy1 currents. 
As reported previously’*, Asp 112 mutation had little effect on the ApH 
dependence of gating. The proton conductance-voltage (gy-V) rela- 
tionship of all D112X mutants shifted roughly -60mV when pH, 
increased from 5.5 to 7.0 (Supplementary Fig. 5), as in wild-type 
channels**'*. Mutation of Asp112 did influence channel opening 
and closing kinetics (Supplementary Table 2). 

Measurements of V,., in Asp 112 mutants showed a marked departure 
from wild-type Hy1 properties. At symmetrical pH 5.5, Vie, was near 
0 mV (not shown). At pH, 7.0, pH; 5.5 (Fig. 2a, column 3), wild-type 
channels reversed near Ey; (—87 mV), indicating proton selectivity. 
But for all mutants except D112E, V,., was substantially positive to 
Ey (Fig. 2b), ranging from —58mV (D112H) to —13 mV (D112N). 
Substitutions at Asp 112 eliminated the proton specificity that distin- 
guishes Hy1 from all other ion channels’. A previous study described 
currents in D112A and D112N mutants”, but did not report V,ey. 

We expected that loss of proton selectivity would result in nonselec- 
tive permeation of cations. Surprisingly, V,., did not change detectably 
when Na‘, K*, N-methyl-p-glucamine* or TEA™ (tetraethylammo- 
nium”) replaced TMA’ (tetramethylammonium* ) (Supplementary 
Table 4). To test anion against cation selectivity, we adopted the classical 
tactic of replacing a fraction of the bath solution with isotonic sucrose”. 
The Nernst equation predicts that dilution of all extracellular ions 
except H® and OH" (leaving internal ion concentrations unchanged) 
will shift V..y negatively for a cation-selective channel, but positively for 
an anion-selective one. Despite the tenfold reduction of buffer concen- 
tration, direct measurement confirmed that pH remained constant. 
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Figure 1 | Identification of five key amino acids that differ in Hy1 and 
C15orf27, and the currents generated in a heterologous expression system 
by Hy1 mutants in which Hy1 residues were replaced by the corresponding 
amino acid in the non-conducting C15orf27. a, Representative subset of 
multiple sequence alignment of 122 VSDs, only transmembrane helices are 
shown. Gene families include Hy1, voltage-sensitive phosphatases, C15orf27, 
Ca** and Na* channels, and K* channels (see Supplementary Fig. 1). 

b, Location of the key amino acids in the open Hy1 channel VSD viewed from 
the side (membrane), based on a homology model”. ¢, Vyey in the four 


Figure 3 illustrates determination of V,., from tail currents in a 
D112H-transfected cell at pH 5.5//5.5 (pH.//pH;) in methanesulpho- 
nate (CH3SO3 ; Fig. 3a) or Cl solution (Fig. 3c), and after 90% 
reduction of external ionic strength (Figs 3b, d). Surprisingly, for 
all Asp 112 mutants, sucrose shifted V,., positively (Supplementary 
Fig. 6), indicating anion selectivity both in CH3SO3 (Fig. 3e) and Cl” 
solutions (Fig. 3f). For Hy1 and D112E, V,., did not change, reaffirm- 
ing their proton specificity. Neutralization of a single Asp residue 
converts a proton channel into a predominantly anion selective 
channel. Thus, Asp 112 mediates charge selectivity as well as proton 
selectivity. 

To confirm anion permeability of Asp 112 mutants, we replaced 
the main external anion, CH3SO3 , with Cl. Consistent with previous 
studies', V,., in Hy1 was unchanged. As shown in D112H (Fig. 3a, c), 
Viev shifted negatively in CI solutions in all mutants (except D112E), 
indicating that Cl is more permeant than the larger CH3SO3 anion 
(Fig. 3g). That all conducting non-acidic mutants showed Cl per- 
meability indicates that Asp 112 mediates not only proton selectivity, 
but also charge selectivity. Currents were smaller than wild type in cells 
expressing some mutants (Supplementary Fig. 7), suggesting a smaller 
unitary conductance. Evidently, these channels conduct anions, but 
not very well. 

Although the mutant channels have diminished selectivity, V,.. did 
shift negatively when pH, increased from 5.5 to 7.0 (Fig. 2b). Because 
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conducting mutants is near Ey; (dashed line), indicating proton selectivity. Viey 
was measured using tail currents; in G215A Vey was positive to threshold and 
was observed directly. d, Voltage-clamp current families in cells expressing Hy1 
mutants. Depolarizing pulses were applied in 10-mV increments from a 
holding voltage, Vioig = —40 mV (D185M, D112V), —60 mV (G215A, $219P), 
or —90 mV (N214D), with the most positive pulse labelled. After membrane 
repolarization, an inward ‘tail current’ is seen as channels close (see inset for 
D185M); pH is given as pH,//pH;. D112V showed no clear current. 


these solutions differ mainly in buffer species and concentrations of 
H* and OH’, Asp 112 mutants must have significant permeability to 
H* and/or OH”. The Goldman-Hodgkin-Katz equation shows how 
Vey depends on ion concentrations: 


Vrev = 


RT ' Pa- [Cl]; ++ Pcu,soz [CH3SO, ]; + Por- [OH ];+Py+ [H*], 
ZF Spa [Cl], + Pcu,so; [CH3SO; ], + Pou- [OH], + Py [H*], , 


where R is the gas constant, T the absolute temperature (Kelvin), z the 
ionic valence (= 1), F is Faraday’s constant and P is permeability. 

Ions with greater permeability dominate V,.,. Permeation of Ht 
and OH are difficult to distinguish because they have the same Nernst 
potential’. The data can be interpreted assuming permeation of either 
(Supplementary Table 3), but the anion selectivity of Asp 112 mutants 
and the pH dependence of sucrose effects (Figs 3e, f) support OH 
permeation. The relative permeability of conducting Asp 112 mutants 
was OH’ (or H*) > Cl > CH;SO;. 

Although Asp 112 is essential to selectivity, other acidic groups 
might participate. We mutated Asp 185, located in the presumed con- 
duction pore (Fig. 1b)'*’’. However, like D185M (Fig. 1b), D185V, 
D185A and D185N remained proton-selective (Supplementary Fig. 8). 
As evidence against additive effects, the double mutant D112N/ 
D185M did not differ from D112N (Supplementary Fig. 8). 
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Figure 2 | Currents in Asp 112 mutants resemble proton currents, but are 
pH. 5.5 pH. 7.0 pH. 7.0 not. a, Currents generated by wild type (WT), D112E, D112H, D112K/A/A, 
2 7 . D112N/A/A, D112S, D112A/A/A and D112F/A/A in COS-7 cells (pH; 5.5) at 


War yo 30 ~30 -70 pH, 5.5 (column 1) or 7.0 (column 2), during families of pulses in 10 mV 
_ increments up to indicated voltages. Tail currents at pH, 7.0 (column 3) reveal 


mV) 


Change in V,.., ( 


that V,., deviates from Ey, indicating loss of proton selectivity. At pH, 5.5 Viola 
—150 pA —150 pA —|1pA was —40 mV (—60 mV for WT). At pH, 7.0 Viola was —40 mV (K, N, $), 
—50mvV (F), —60 mV (H, A), —80 mV (E), or —90mV (WT); Vpre was 
0 —65mV (WT), —40 mV (E), —10 mV (H), +50 mV (K), +40 mV (N, S), or 
+20 mV (A, F). V,.y (arrows) was determined from the amplitude and 
direction of tail current decay. For D112N, Viey was above Vihreshola and was 
evident during pulse families. b, Shift in Ve, when the TMACH3SO3 bath 
—150 pA solution was changed from pH.5.5 to 7.0. There is no difference between WT 
0.1s and D112E, but the shift in all other mutants is smaller than WT (P < 0.001, by 


60 40 -50 one-way ANOVA followed by Tukey’s test, n = 7, 4, 9, 8, 6, 7, 9 and 4). Error 
Ze —— Mele bars in b are s.e. Dashed line shows Ej. 
100 90 : : ; _— ; r P 
Consistent with earlier predictions that a titratable amino acid pro- 
vides the selectivity filter of Hy1 (refs 1, 8-11), only channels with 
“<0 acidic residues (Glu or Asp) at position 112 manifested proton spe- 
— 
0 


cificity. Asp 112 lies at the constriction of the presumed pore (Fig. 1b), 
a logical location for a selectivity filter, and just external to the postu- 
lated gating charge transfer centre”. Our original prediction envi- 


. = as sioned selectivity arising from protonation/deprotonation of a 
_ pas residue during conduction, but other mechanisms are possible. For 
example, proton selectivity of the influenza A M, viral proton channel 
1 20°pA _15 pA 12.5 pA has been explained by (1) immobilized water’', (2) successive proton 
2s 2s 1s transfer and release by His 37 (refs 22, 23) and (3) delocalization of the 

tie ” 20 proton among His 37 and nearby water molecules”. 
Zz The Cl permeability of D112H was completely unexpected, given 
el strong precedents for His imparting proton selectivity to channels. 
— 20 pA 20 pa ___|20 pa Histidine shuttles protons in K* or Na* channel VSDs with Arg—His 
23 es ca mutations”, in carbonic anhydrase” and in M, channels”””*. However, 
70 20 10 these molecules are not proton-specific’’”’. Evidently, His shuttles 
ae == protons, but does not guarantee proton selectivity. In Hyl, Asp 112 
= ms (or Glu112 in D112E) excludes anions, resulting in proton-specific 
+ 25 pA __110 pA 7 — 10 pA conduction. When protonated, Glu and Asp are neutral whereas His is 

s 2s s 


cationic, which may explain why D112H fails to exclude anions. 
The anion selectivity of neutral Asp 112 mutants indicates that elec- 


40 
o trostatic forces due to the charge distribution in the rest of the channel 
ae deter cation permeation, and that the cation selectivity of the wild-type 

— channel is due to the anionic charge of Asp 112. Asp 185 does not 


__|10 pa __}10pA_i5pa participate directly in selectivity (Supplementary Fig. 8). VSP family 
10s 1s 0.58 members possess the equivalent of Asp 112 (Fig. 1a), yet conduct no 
current’’, illustrating that Asp 112 requires a specific microenviron- 
D112K D112N D112A D112F ment to achieve selectivity. Although permeation of Cl and CH;SO3 

H140AH140A H140A H140A ‘ . : Pair 
WT D112E D112H ee oan D112S ican tt93a suggests a wide pore in D112X mutants, local geometry might differ in 


0 wild-type channels due to the presence of anionic Asp 112. 

Regulation of voltage gating by ApH is distinct from permeation. 
Pathognomonic of Hy] is a strict correlation between the gy-V rela- 
tionship and V,.,, in which Vipreshola Shifts 40 mV per unit change in 
ApH (ref. 8). The ApH dependence persisted in mutants with shifted 
gu-V relationships’. Here we show uncoupling of V,., and voltage 
gating. Asp112 mutants retained normal ApH dependence (Sup- 
plementary Fig. 5), despite the dissociation of Vy from ApH (Fig. 2 
and Supplementary Fig. 9). This uncoupling of pH control of gating 
from permeation speaks against any mechanism that invokes regu- 
lation by local proton concentration in the vicinity of $4 Arg residues”’. 

In summary, Asp 112 is a critical component of the selectivity filter 
of Hy1, crucial to both proton selectivity and charge selectivity. That 
D112E was proton-selective, but D112H conducted anions indicates 
that this proton channel requires an acid at the selectivity filter. That 
neutralization of nearby Asp 185 did not affect selectivity suggests that 
Asp 112 has a unique role. 
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Figure 3 | Dilution of ionic strength by 90% with isotonic sucrose shifted 
Vey positively, indicating that most Asp 112 mutants are anion-selective. 
a, Measurement of V;ey by tail currents in a cell transfected with D112H at 
pH5.5//5.5, and b, after sucrose. c, V,., in the same cell in pH5.5 CI solution, 
and d, after sucrose. Arrows indicate zero current. Viola = —40 mV, 

Vpre = +60 mV. e, Mean shifts of V,-, with decreasing ionic strength in 
CH3SQO3 solutions or f, in Cl solutions. Each value was determined in 3-6 
cells. X = not done. g, Shifts of V,.~ when CH3SO3 was replaced by Cl. Values 
for WT and D112E do not differ significantly from 0 mV. For all anion-selective 
mutants except D112N, the difference between shifts at pH 5.5 and 7.0 was 
significant (P < 0.001, one-way ANOVA followed by Tukey’s test; n = 3-8). 
Error bars in e-g are s.e. 


METHODS SUMMARY 


The pipette solution (also used externally) contained (in mM) 130 TMACH;SOs, 2 
MgCl, 2 EGTA, 80 MES (2-(N-morpholino)ethanesulphonic acid), titrated to 
pH5.5 with ~20 TMAOH. In the pH5.5 TMACI solution, TMACI replaced 
TMACH;SO;3. Bath solutions at pH 7.0 had (mM) 90 TMACH;SO; or TMACI, 
3 CaCl, 1 EGTA, 100 BES and 36-40 TMAOH. For experiments with Zn’*, 
solutions contained PIPES without EGTA. Experiments were done at 20-25 °C. 
Currents are shown without leak correction. V,.y data were corrected for liquid 
junctions potentials measured in each solution”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Exhaustive searches to identify Hyl homologues were performed using protein 
BLAST and PSI-BLAST. A sample of VSDs from K*, Na* and Ca?" channels 
(that open with depolarization like Hy1, and in addition one that opens with 
hyperpolarization), along with putative Hy1, VSP and C15orf27 homologues were 
chosen. For cation channels, we sampled from the range of subfamilies, from the 
VSD repeats within Na* and Ca** channels, and from the range of species. VSD 
sequences, including crystallized K* channels (PDB accessions 1ORS, 2R9R and 
2A79), were aligned using PromalS3D*’, which incorporates structural informa- 
tion, allowing high-confidence identification of VSD boundaries. Sequences were 
trimmed to the VSD, realigned with PromalS3D, and the resulting alignment was 
analysed with PhyML (maximum likelihood)” and Protpars (maximum par- 
simony)** at the Mobyle portal**. Trees were visualized with TreeDyn*’ and 
iTOL”. Parsimony (not shown) and maximum likelihood trees had similar topo- 
logy, including Hy1 and Cl5orf27 families separating into discrete branches. A 
homology model of the VSD of Hy1 was constructed as described previously’’. 
The C15orf27 clone was PCR-amplified from human cerebellum and subcloned 
into pcDNA3.1(+) expression vector (Invitrogen). The coding sequence of 
human Hyl (HVCN1) was cloned into either pcDNA3.1(—) or pQBI25-fC3 (to 
make GFP-Hy1) vectors as described previously'®. Site-directed mutants were 
created using the Stratagene QuikChange (Agilent) procedure according to the 
manufacturer’s instructions. All the positive clones were sequenced to confirm the 
presence of the introduced mutation. HEK-293 or, more often COS-7 cells were 
grown to ~80% confluency in 35-mm cultures dishes, usually by seeding cells 
1 day ahead of transfection. Cells were transfected with 0.4-0.5 1g of the appro- 
priate cDNA using Lipofectamine 2000 (Invitrogen). After 6 h at 37 °C in 5% COs, 


the cells were trypsinized and replated onto glass cover slips at low density for 
patch clamp recording the following day. We selected green cells under fluor- 
escence for recording. Patch-clamp methods were described previously’®. 

The main pipette solution (also used externally) contained (in mM) 130 
TMACH;SO3, 2 MgCl, 2 EGTA, 80 MES, titrated to pH5.5 with ~20 
TMAOH. In the pH5.5 TMACI solution, TMACI replaced TMACH;SO3. Bath 
solutions at pH 7.0 had (mM) 90 TMACH;SO3 or TMACI, 3 CaCh, 1 EGTA, 100 
BES, and 36-40 TMAOH. For experiments with Zn**, solutions contained PIPES 
buffer” without EGTA. Experiments were done at 20-25 °C. Currents are shown 
without leak correction. V,.y data were corrected for liquid junctions potentials 
measured in each solution”. 
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Lowland-upland migration of sauropod dinosaurs 
during the Late Jurassic epoch 


Henry C. Fricke', Justin Hencecroth! & Marie E. Hoerner'+ 


Sauropod dinosaurs were the largest vertebrates ever to walk the 
Earth, and as mega-herbivores they were important parts of terrestrial 
ecosystems. In the Late Jurassic-aged Morrison depositional basin 
of western North America, these animals occupied lowland river- 
floodplain settings characterized by a seasonally dry climate’. 
Massive herbivores with high nutritional and water needs could 
periodically experience nutritional and water stress under these 
conditions, and thus the common occurrence of sauropods in this 
basin has remained a paradox. Energetic arguments and mam- 
malian analogues have been used to suggest that migration allowed 
sauropods access to food and water resources over a wide region or 
during times of drought or both**, but there has been no direct 
support for these hypotheses. Here we compare oxygen isotope 
ratios (5'°O) of tooth-enamel carbonate from the sauropod 
Camarasaurus with those of ancient soil, lake and wetland (that 
is, ‘authigenic’?) carbonates that formed in lowland settings. We 
demonstrate that certain populations of these animals did in fact 
undertake seasonal migrations of several hundred kilometres from 
lowland to upland environments. This ability to describe patterns of 
sauropod movement will help to elucidate the role that migration 
played in the ecology and evolution of gigantism of these and asso- 
ciated dinosaurs. 

Inferring the behaviour of ancient organisms is difficult, but geo- 
chemical information preserved in their fossil remains can provide 
such an opportunity. This study of sauropod dinosaur behaviour relies 
on the fact that 5'8O values of surface waters (5'°O,g for example 
streams, lakes) vary significantly over any given landscape in response 
to differences in aridity and elevation among other environmental 
factors’®. Authigenic carbonates (CaCO3) form in basin soils, lakes 
and wetlands, and record the oxygen isotopic characteristics of these 
host isotopic domains when they precipitate. Similarly vertebrate tooth 
enamel (bioapatite Ca,(PO,, CO3)3(OH, CO3)) records the oxygen 
isotope characteristics of the surface water reservoirs that serve as their 
drinking water”*. If 5'°O.¢ inferred from ‘non-migratory’ authigenic 
carbonates and from dinosaur tooth enamel differ, then it can be 
concluded that dinosaurs were drinking water that fell outside the 
basin and thus they travelled outside it. 

To use this approach we analysed enamel carbonate from teeth 
(n = 32) of Camarasaurus sp. and Camarasaurus lentus collected at 
Thermopolis, Wyoming, and Dinosaur National Monument, Utah 
(DNM), respectively (Fig. 1a). Palaeosol and lacustrian carbonates 
were also analysed from DNM (n = 38; see Supplementary Informa- 
tion for details on methods and statistics). In addition, we used pub- 
lished 5'°O data obtained from a variety of authigenic carbonates 
found over the entire Morrison basin including the Thermopolis 
area’ ’*, Comparisons of isotopic data from co-occurring authigenic 
carbonates and tooth enamel, from tooth-enamel carbonate and 
tooth-enamel phosphate, and from single teeth indicate that primary 
palaeobiological information is preserved in tooth enamel (see 
Supplementary Information for more details about diagenesis). 

To estimate 5'°O.rusing dinosaur tooth enamel, it is assumed that they 
fractionated oxygen isotopes in a manner similar to all water-dependent 


vertebrates studied so far, including birds, mammals and reptiles’®. 
To estimate 5'°Oy using authigenic carbonate, it is assumed that 
oxygen isotope fractionation occurred at 24 °C, a temperature consist- 
ent with modelled mean annual temperature for the region’’ (see 
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Figure 1 | Fossil localities, inferred oxygen isotope ratios of surface water 
and possible Camarasaurus migration routes. a, Palaeogeography of western 
North America during late Jurassic/Morrison time (after refs 2, 9), including 
fossil localities and one hypothetical migration route. b, 5'°O,+ estimated using 
tooth enamel (reds) and authigenic carbonates (greens; Thermopolis data from 
ref. 11; all-basin data from refs 9, 10, 12). See text and Supplementary 
Information for details. 
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Figure 2 | Oxygen isotope ratios of serial enamel samples compared with 
position relative to the base of the tooth for ten different teeth from DNM. 
Teeth form incrementally such that the oldest enamel is at the tip and the 
youngest enamel is at the base (see inset). Most teeth are characterized by a 


Supplementary Information for details about estimates of 5180,,). 
Although 5'°O,; inferred for all authigenic carbonates have relatively 
high and limited 35180 values, those inferred for Camarasaurus have 
significantly different means and variances (Supplementary Tables 
1-3). Most importantly, they preserve a record of the lowest 8'°O.¢ 
of all carbonates analysed from the Morrison basin (Fig. 1b). 

3180, values inferred from Camarasaurus that overlap with those 
from authigenic carbonates are consistent with these animals spending 
time in the fluvial and wetland environments of the basin. In contrast, 
lower 5'°O of surface water and precipitation (8'°O,¢ < approximately 
—9%o) implied by a large proportion of Camarasaurus teeth indicates 
that they occupied non-basinal settings. Low 5'*O,¢values result from 
the preferential rainout of 180 from air masses as they rise, cool and 
lose water while crossing topographic barriers such as volcanic high- 
lands west of the basin (Fig. 1a). Thus, they indicate that animals from 
both DNM and Thermopolis were drinking water from these high- 
elevation regions. Although it is possible that ‘extra-basinal’ high- 
elevation waters could flow into lakes and rivers located in the basin 
proper, the fact that lake and wetland carbonates do not have low 5'°O 
indicates that such recharge did not have a major influence on 8'*O,¢in 
the basin (Fig. 1b). Therefore Camarasaurus populations in these areas 
must have directly occupied high-elevation regions for at least part of the 
year before returning to the basin where they died. To do so, these animals 
must have migrated approximately 300 km in each direction based on 
palaeogeographical reconstructions for the Late Jurassic”?(Fig. 1a). 

Patterns in 5'*O obtained from single teeth provide evidence that 
this migration was seasonal in nature. Because vertebrate teeth, includ- 
ing those of dinosaurs“, form incrementally, sequential sampling 
along the length of a tooth provides a record of 5'*O,¢ingested during 
the time of tooth formation'®”. Intra-tooth variations in 5'8O, 
inferred from camarasaurid teeth of a single DNM C. lentus skull 
(see Supplementary Information) appear to capture slightly less than 
half of the sinusoidal cycle that is expected for a single year'*”’, thus 
indicating that these teeth formed over approximately 4-5 months 
(Fig. 2). The specific pattern in 580g implies that this animal moved 
out of the basin into highland regions over the period of tooth forma- 
tion, yet the teeth are found in the basin. Such a situation is possible 
because tooth enamel does not provide an instantaneous record of 
ingested 5'*O,g¢ rather, there is a temporal lag associated with the 
turnover of oxygen in the body. This lag is of the order of 2 weeks 
for small mammals"*, and although the length of time is unknown for 
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gradual decrease in oxygen isotope ratios over time. Tooth DNM 36 is unworn 
and thus preserves the longest temporal record (approximately 4-5 months), 
whereas records from other worn teeth are truncated to various degrees. 


sauropod dinosaurs it cannot have been longer than several weeks to a 
month, otherwise seasonal variations would be obscured altogether. 
Thus, over 5-6 months, this individual left the basin for the highlands 
and then returned to the DNM area. 

Assuming that Camarasaurus migrated in an effort to obtain the 
food and water they needed to survive, they would have left the basin 
during the dry season (presumably summer”’) when plant growth was 
limited and drought might have been common, and then returned in 
the wet season (presumably winter'’). The fact that the DNM C. lentus 
died before preserving a record of basin 3'8O,¢ in its tooth enamel 
suggests that it was recently returned, and that it died during the trans- 
ition from the dry to wet season. The similarity in bulk 8'*O among 
DNM teeth from other individuals suggests that other Camarasaurus 
from DNM exhibited similar behaviour. Without well-constrained 
intra-tooth data from Thermopolis is it not possible to describe 
Camarasaurus migrational patterns in as much detail in this area. 
However, the fact that 5'°O,- inferred from Thermopolis are generally 
higher than those from DNM (Fig. 1b) could mean that the 
Thermopolis teeth captured a different part of the seasonal cycle in 
5'°O.¢ and thus might have died during a different time of the year, 
that Thermopolis teeth grew during different year(s), or that these 
animals visited a different (possibly lower elevation) part of the 
western highlands. 

Overall, the research presented here provides strong support for the 
hypothesis that Camarasaurus could undertake long seasonal migra- 
tions. It does not, however, imply that they must have done so. 
Ongoing studies of other Camarasaurus populations and of other 
sauropods living in different areas will allow us to determine if migra- 
tions were a universal characteristic of these animals, or whether it was 
a behavioural response to environmental stress. In turn it will be 
possible to address the role that migration might have played in the 
evolution of sauropod gigantism. 
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Experimental infection of bats with Geomyces 
destructans causes white-nose syndrome 


Jeffrey M. Lorch!?, Carol U. Meteyer’, Melissa J. Behr®, Justin G. Boyles", Paul M. Cryan®, Alan C. Hicks®, Anne E. Ballmann?, 
Jeremy T. H. Coleman’, David N. Redell®, DeeAnn M. Reeder’ & David S. Blehert? 


White-nose syndrome (WNS) has caused recent catastrophic declines 
among multiple species of bats in eastern North America’’. The 
disease’s name derives from a visually apparent white growth of the 
newly discovered fungus Geomyces destructans on the skin (includ- 
ing the muzzle) of hibernating bats'*. Colonization of skin by this 
fungus is associated with characteristic cutaneous lesions that are the 
only consistent pathological finding related to WNS*. However, the 
role of G. destructans in WNS remains controversial because evid- 
ence to implicate the fungus as the primary cause of this disease is 
lacking. The debate is fuelled, in part, by the assumption that fungal 
infections in mammals are most commonly associated with immune 
system dysfunction*’. Additionally, the recent discovery that G. 
destructans commonly colonizes the skin of bats of Europe, where 
no unusual bat mortality events have been reported* ", has generated 
further speculation that the fungus is an opportunistic pathogen and 
that other unidentified factors are the primary cause of WNS'”. 
Here we demonstrate that exposure of healthy little brown bats 
(Myotis Iucifugus) to pure cultures of G. destructans causes WNS. 
Live G. destructans was subsequently cultured from diseased bats, 
successfully fulfilling established criteria for the determination of 
G. destructans as a primary pathogen’’. We also confirmed that 
WNS can be transmitted from infected bats to healthy bats through 
direct contact. Our results provide the first direct evidence that 
G. destructans is the causal agent of WNS and that the recent 
emergence of WNS in North America may represent translocation 
of the fungus to a region with a naive population of animals®. 
Demonstration of causality is an instrumental step in elucidating 
the pathogenesis’* and epidemiology’ of WNS and in guiding 
management actions to preserve bat populations against the novel 
threat posed by this devastating infectious disease. 

To test the ability of G. destructans to act as a primary pathogen, we 
housed healthy little brown bats (Myotis lucifugus; n = 29) in the 
laboratory under hibernation conditions and treated them with 
conidia of G. destructans harvested from pure culture. Histological 
examination of treated bats that died during the course of the experi- 
ment showed that lesions diagnostic for WNS were apparent by 83 
days after treatment. All treated bats were positive for WNS by his- 
tology when the trial was terminated at 102 days after treatment. In 
contrast, at the end of the experiment, all bats from the negative 
control group (bats treated identically but not exposed to conidia of 
G. destructans; n = 34) were negative for WNS by histology. 

We also investigated the potential for WNS to be transmitted from 
infected to healthy animals by co-housing hibernating bats naturally 
infected with WNS (collected from an affected hibernaculum and 
showing clinical signs of the disease; n = 25) with healthy bats (contact 
exposure group; n = 18). Eighty-nine per cent of bats in the contact 
exposure group developed WNS lesions by day 102, demonstrating 
for the first time that WNS is transmissible. This has important 


epidemiological and disease management implications, because many 
of the bat species most commonly impacted by WNS often form tight, 
occasionally mixed-species clusters during hibernation, facilitating the 
transfer of fungus among individuals and species. In addition, bat 
species affected by WNS engage in ‘swarming’ behaviour at hibernacula 
just before hibernation. During this time, there is much direct contact 
between individuals as they participate in a promiscuous mating sys- 
tem!*. Furthermore, individual bats have been documented to move 
long distances between hibernacula during this period’’, which may, in 
part, facilitate the spread of WNS across the landscape. 

To determine if WNS could be spread between bats through the air, 
healthy bats (n = 36) were placed in mesh cages in close proximity to 
(separated by 1.3cm), but not in direct contact with, the positive 
control and treated groups. After a period of 102 days, none of the 
animals exposed to possible airborne conidia from bats with WNS 
showed histopathological evidence of infection. This may be due to 
an inability of G. destructans conidia to travel through air at levels 
sufficient to establish infections in neighbouring individuals over the 
experimental interval or could reflect that conditions within the incu- 
bators (for example, airflow patterns and/or static charges) were not 
conducive to airborne transfer of conidia. 

The fungal skin lesions that developed in treated and contact- 
exposed animals were indistinguishable from those that occurred in 
the positive control bats (Fig. 1). Additionally, the prevalence of infec- 
tion was similar between the two groups (Table 1), indicating that the 
treated group did not develop disease from exposure to an excessively 
high dose of conidia. Similar disease pathology between groups also 
indicates that the contact-exposed bats did not develop WNS through 
exposure to an agent other than G. destructans. Histological examina- 
tion of hearts, intestines, livers, lungs and kidneys from a subset of 
animals (positive control group n = 5, negative control group n = 3, 
treated group n = 10, contact exposure group n = 5) did not reveal any 
tissue damage or other signs of infectious processes that might have 
predisposed the animals to skin infection by G. destructans. 
Furthermore, live G. destructans was cultured from the skin of bats 
confirmed to have WNS lesions. Development of lesions diagnostic for 
WNS in the absence of other signs of disease provides the first experi- 
mental evidence that G. destructans is a primary pathogen and causes 
WNS in healthy bats. 

The large-scale mortality seen in wild bat populations with WNS 
was not observed in the treated or contact exposure groups. Although 
all of the positive control animals died before the termination of the 
trial, survivorship (P = 0.72) and body mass index (BMI; P = 0.96) of 
the remaining groups did not significantly differ from the negative 
control group (Fig. 2a). The lack of WNS-related mortality in the 
treated and contact exposure groups is best explained by the short 
period of time these groups were exposed to G. destructans. On 
the basis of an analysis of wild bats submitted to the US Geological 
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Figure 1 | Histological sections of representative wing membranes (periodic 
acid-Schiff stain). a, Normal wing membrane of a healthy bat from the 
negative control group showing no signs of fungal growth. b, c, WNS lesions, 
including invasion of the underlying connective tissue by fungal hyphae 
(arrows), are visible in sections from a bat with WNS from the positive control 
group (b) and a bat from the treated group that developed WNS after 
experimental exposure to G. destructans (c). Insets are higher magnification 
images and scale bars indicate 20 jim. 


Survey (USGS)-National Wildlife Health Center (NWHC) for 
diagnostic testing (January 2008 to June 2011), WNS lesions have 
seasonally first been detected during autumn (late September), just 
before the start of long-term hibernation; major mortality events 
caused by WNS have seasonally not been observed among wild bats 
until the end of January (Fig. 2b). These data indicate that mortality 


Table 1 | Development of WNS in experimentally infected bats 


Treatment group Number with WNS Number with no Total Per cent 
lesions present WNS lesions infected 
Negative control 0) 34 34 0) 
Treated 29 0 29 100 
Contact exposure 16 2 18 89 
Airborne exposure 0 36 36 0) 
Positive control 25 0 25 100 


The data show prevalence of WNS-associated fungal infections established in groups of healthy little 
brown bats inoculated with conidia of G. destructans from pure culture or exposed to bats known to have 
WNS (positive control group). Infection status was determined by histological examination of the wing. 


2 | NATURE | VOL 000 | 00 MONTH 2011 


«= Treated 

== Contact exposure 
== Airborne exposure 
«= Negative control 
== Positive control * 


Proportion of experimental animals remaining 


0 20 40 60 80 100 


Days after treatment 


s 


40 Treated group 


| 1.0 
35 


0.8 
Positive control group 


g | 
5 : 
Q a 
2 fo) 
iA =] 
a 30 St aioe — 3 
a” @ 
Zz x 
Re 3 
BE 08 8 
a 2 
g° 06 
oa = 
° 04 2 
o. 7) 
is @ 
i 0.2 
5 z 
_ = 
2 = 0 5 
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun 2 
Month 


Figure 2 | Survival curves. a, Survival curves for the treated (n = 29), contact 
exposure (” = 18), airborne exposure (n = 36), negative control (n = 34) and 
positive control (n = 25) groups. Bats in the positive control group, which 
consisted of animals naturally infected with WNS at the time they were 
collected, exhibited significantly decreased survival (asterisk) relative to the 
other groups (P < 0.001). Survival among bats of the remaining groups did not 
differ significantly from one another (P = 0.72). b, Percentage of bats submitted 
by month (January 2008 to June 2011) to the USGS-National Wildlife Health 
Center that tested positive for WNS (n = 54 submission events). The blue bars 
represent submissions that were not associated with major mortality events; the 
red bars depict submissions associated with high mortality. Annually, WNS- 
associated mortality events are first observed in January; the number of 
submissions involving mortality events for a given month peaks in March. 
Assuming the positive control bats were first exposed to G. destructans in late 
September, mortality due to WNS did not occur in the laboratory until 
approximately 120 days after exposure, consistent with what is observed in free- 
ranging wild bats (the dotted line represents the exposure period in the wild 
before the animals were collected for this study). The duration of this infection 
trial (102 days) was insufficient to observe WNS-associated mortality in the 
treated and contact exposure groups (the treated group mortality curve is 
shifted such that duration of exposure corresponds to that of the positive 
control group; contact and airborne exposure group mortality curves are not 
shown). 


from WNS does not manifest until approximately 120 days after bats 
enter hibernation and assume a cold physiological state conducive to 
proliferation of G. destructans; mortality subsequently peaks about 
180 days after bats first enter hibernacula (in the month of March). 
Assuming that initial exposure of positive-control bats to the fungus 
occurred in late September, these animals survived about 110 to 
205 days after exposure, with approximately 50% having died by the 
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150-day mark. The treated and contact-exposure bats were only 
exposed to G. destructans for 102 days. Thus, the experiment was 
terminated before the disease had progressed to the degree that mor- 
tality would be expected among treated and contact-exposed animals. 

Our work demonstrates experimental infection of little brown bats 
by G. destructans with subsequent development of WNS in the absence 
of underlying health conditions. It follows that the recent widespread 
detection of G. destructans in Europe without apparent detriment to 
bat populations indicates that the fungus may be endemic to that 
region where it co-evolved with continental bat species*’®. In North 
America, the data indicate that WNS originated at a single site’’* with 
high tourist traffic, consistent with the introduction of an exotic species”. 
Thus, the pathological effects caused by G. destructans in North 
American bats may reflect exposure of a naive host population to a 
novel pathogen. Future studies are needed to investigate the origin of 
G. destructans in North America and to elucidate differences in 
physiology and behaviour between North American and European 
bats that might account for disparate disease outcomes observed 
among the two continents. 

Fungal pathogens have the unique capacity to drive host popula- 
tions to extinction because of their ability to survive in host-free 
environments’. Given the high mortality rate and speed at which 
WNS has spread, the disease has the potential to decimate North 
American bat populations and cause species extinctions” similar to 
those documented for amphibians affected by chytridiomycosis”. 
Advancement of WNS research and management has been limited 
by uncertainty over the causative agent of this disease. With the 
causative agent now conclusively identified through fulfilment of 
Koch’s postulates, future research efforts can focus on mitigating the 
effects of WNS before hibernating bat populations suffer losses beyond 
the point of recovery. 


METHODS SUMMARY 


Little brown bats (Myotis lucifugus) naturally infected with WNS (positive control 
group; n = 25) were collected from a hibernaculum in New York. Healthy (based 
upon body condition and histopathology findings) little brown bats were collected 
from a hibernaculum in Wisconsin outside of the known range of WNS. Healthy 
bats were divided into four groups: negative control (n = 34), treated (n = 29), 
contact exposure (1 = 18) and airborne exposure (m = 36). Conidia of G. destructans 
(5 X 10° conidia suspended in 20 ll of phosphate buffered saline solution contain- 
ing 0.5% Tween 20 (PBST)) were applied to one of the wings of bats in the treated 
group, and an additional 5 X 10° conidia were applied to the fur between the eye 
and ear. Negative control bats were treated identically with PBST lacking conidia. 
Animals were maintained in mesh enclosures (Supplementary Fig. 1) under con- 
ditions approximating bat hibernacula for 102 days. The experimental end point 
was set to correspond with the timing by which wild bats naturally emerge from 
hibernation. Infection status was determined by histological examination of the 
muzzle and skin from each wing’, and G. destructans was re-isolated from 
wing skin as previously described”. The identity of fungal isolates resembling 
G. destructans was confirmed by PCR amplification/double-stranded sequence 
analysis of the rRNA gene internal transcribed spacer”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Animals. This study was conducted at the NWHC in accordance with 
Institutional Animal Care and Use Committee Experimental Protocol 081118. 
WNS-positive little brown bats (Myotis lucifugus) (positive control group; 
n= 25) were collected from a hibernaculum in New York in January 2009; only 
bats showing visible signs of fungal growth on the muzzle and/or wings were 
collected from the New York site. Healthy (based upon body condition and his- 
topathology findings) little brown bats were collected from a hibernaculum in 
Wisconsin (approximately 1,000 km distant from the known range of WNS at 
the time that animals were collected). Bats were transported to the NWHC in 
coolers at approximately 7 °C. 

Experimental infection. Healthy bats were randomly (except for ensuring nearly 
equal sex ratios) divided into groups: negative control (n = 34), treated (n = 29), 
contact exposure (n = 18) and airborne exposure (n = 36). Negative control, 
positive control and treated groups were maintained in separate rooms. 
Animals in the contact exposure group were placed in the same enclosures as 
the positive control group. Animals in the airborne exposure group were split 
evenly between separate enclosures, each located 1.3 cm from enclosures housing 
the positive control and treated groups (Supplementary Fig. 1). 

Conidia were harvested from 60-day-old cultures of the type strain of G. 
destructans’ (American Type Culture Collection number ATCC MYA-4855) by 
flooding plates with phosphate buffered saline solution containing 0.5% Tween20 
(PBST). Conidia were washed, enumerated and re-suspended in PBST. Twenty 
microlitres of the conidial suspension containing 5 X 10° conidia were pipetted 
directly onto the dorsal surface of one of the wings of bats in the treated group; an 
additional aliquot (20 ul) was pipetted onto the fur between the eye and ear. 
Negative control bats were treated identically with PBST lacking conidia. Bats 
were housed in mesh enclosures (Reptaria; Apogee) within refrigerators (SRC 
Refrigeration) under conditions approximating bat hibernacula (complete dark- 
ness, approximately 6.5 °C and 82% relative humidity) for 102 days. Termination 
of the experiment corresponded to the time period during which wild bats begin to 
emerge from hibernation. Temperatures were recorded daily in each refrigerator 
to ensure that appropriate hibernation conditions were maintained. The mean 


temperatures (+ standard deviation) for the refrigerators were as follows: negative 
control group, 6.4 + 0.8 °C; positive control, airborne exposure (in part) and 
contact exposure groups, 6.7 + 0.4°C; treated and airborne (in part) exposure 
groups, 6.4 + 0.8 °C. BMI was calculated by dividing body mass at the time that 
the bats were euthanized by forearm length. Because animals that died naturally 
during the trial became desiccated, BMI was only calculated for bats that were 
euthanized. 

Diagnosis of WNS was made through histological examination of the muzzle 

and a portion of skin from each wing*. G. destructans was re-isolated in culture 
from wing skin as described previously” and identified by PCR amplification/ 
double-stranded sequence analysis of the rRNA gene internal transcribed spacer”. 
Statistical analyses. Survivorship was compared among groups using the Gehan- 
Breslow survival test (SigmaPlot 11.0; Systat Software) because this method gives 
more weight to animals that died naturally during the experiment and less weight 
to the large number of censored data points (that is, euthanized animals) at the end 
of the experiment. Pair-wise comparisons were examined with the Holm-Sidak 
procedure (significance at P< 0.05). BMI was compared among groups (negative 
control group, n = 27; treated group, n = 25; contact exposure group, n = 15; 
airborne exposure group, / = 27) using an analysis of variance test (significance 
at P<0.05) after confirming that the data met assumptions of normality 
(Shapiro-Wilk test, P=0.07) and equal variances (Levene median test, 
P= 0.87). One bat from the treated group was excluded from the BMI analysis 
because its weight was not measured before euthanasia and sample collection. 
Three bats from the treated group were euthanized 34 days after exposure to assess 
whether WNS lesions were developing; WNS lesions were not detected in these 
animals. Because these three animals were prematurely removed from the experi- 
ment, they were excluded from further analyses and are not represented in the 
specified sample sizes. 
Equipment and settings. Prepared tissue sections were examined using an 
Olympus BH-2 upright microscope with SPlan Apo X40 and X100 objectives 
(Olympus Optical). Images were collected in tagged image file format using a 
digital colour camera (Insight2) and Spot Basic Version 4.0.8 (Diagnostic 
Instruments). 
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